Quantile-based classifiers 



Christian Hennig and Cinzia Viroli 



m 
O 



C/2 



> 

(N 
oo 
(N 

rn 
O 
m 



X 



Abstract 

Quantile classifiers for potentially high-dimensional data are defined by classi- 
fying an observation according to a sum of appropriately weighted component- wise 
distances of the components of the observation to the within-class quantiles. An op- 
timal percentage for the quantiles can be chosen by minimizing the misclassification 
error in the training sample. 

It is shown that this is consistent, for n oo, for the classification rule with 
asymptotically optimal quantile, and that, under some assumptions, for p — >■ oo the 
probability of correct classification converges to one. The role of skewness of the 
involved variables is discussed, which leads to an improved classifier. 

The optimal quantile classifier performs very well in a comprehensive simulation 
study and a real data set from chemistry (classification of bioaerosols) compared 
to nine other classifiers, includin g the supp o rt ve ctor machine and the recently 
proposed median-based classifier EnHI] ») , which inspired the quantile 
classifier. 

KEY WORDS: median-based classifier, high-dimensional data, misclassification 
rate, skewness 



1 Introduction 

Supervised classification is a major issues in statistics and has received a wide interest in 
the scientific literature of many disciplines. 



The "large microcosm" of classification methods (iHandl . 119971 ) can be broadly divided 
into parametric methods, which make distributional assumptions about the data, and 
nonparametric methods, which alternatively concentr ate on the local v i cinity of the point 
to be classi fied, such as near est neighbour methods ( j Cover and Hartl . 119671 ) and kernel 
smoothing (IMika et al.1 . 119991 ) . 

Parametric methods use the estimated class conditional distributions for the construction 
of the classification rule. T he traditional linear and quadr atic discriminant analysis, mix- 
ture di scriminant analysis ( Hastie and Tibishiranil. 119961 ) . the naive Bayes probabilistic 



sis (Bensmail and Celeux . 19961 : 



mo del flJohn and Langleyi. Il995: iHand and Yul. 1200 ih. rn odel-based discriminant analy- 



Fraley and Rafteryl . 12002 ) and nonlin e ar neural network s 



(jRiplevl. 119941) are e xamples of such methods. See also lFriedmanI (119891 ) : lGuo et al.l ( 120071 ): 
Cai and Lid (120111 ) and the references therein. Implementing such methods in high di- 
mensional settings, which are very common nowadays, can be cumberso me and comput a- 
tionally demanding, because of the well-known curse of dimensionality (IBellmanl . Il96lh . 
A great deal of work, especially on distance-based methods, has been carried out to try 
to circumvent this problem. Distance-based classifiers only use partial information of 
the class conditional distributions, typically central moments. Centroid-based methods 



1 



have been successfully used for gene exp ression data (jTibshirani et al.l. l2002l: iDudoit et al. 



2002 



Dabneyl . l2005l : iFan and Fanl 120081 ) . Median-based classifiers (jjornsteru . |2004J : iGhosh and Chaudhuri , 



20051) repr e sent a more robust alternative in problems where distributions have heavy tails. 



Hall et al.l ( 120091 ) proposed a component-wise median based classifier which behaves well 



in high dimensional space. It assigns a new observed vector to the class having the small- 
est Li-distance from the class conditional component-wise median vectors of the training 
set. 

All these methods consider the distance from the "core" of a distribution as the ma- 
jor source of the discriminatory information. But tails may be important as well and 
may contain relevant information. It may therefore be fruitful to go beyond the central 
moments. 

In this work we define and explore a family of classifiers based on the quantiles of the 
class conditional distributions. Th e idea was originally inspired by the component-wise 



median classifier (IHall et al.l . |2009| ) . 



More specifically, by using the natural distance for quantiles, we will obtain the component- 
wise quantile classifier as function of the ^-quantile, 9 G [0, 1]. The optimal 9 chosen in the 
training set will define the empirically optimal quantile classifier. We will prove the consis- 
tency of this choice for the 9 that yields the optimal true correct classification probability 
as n — )■ oo. We will also show under certain assumptions that the correct classification 
probability conve rges to one as p — oo together with the sample size, similarly to what 



Hall et al.l (120091 ) did for the component-wise median classifier. 



The paper is organized as follows. In Section 2 we review the distance-based classifiers 
and define the proposed quantile classifier. The theoretical properties of the method are 
explored in Sections 3 and 4. A large simulation study and a real application are presented 
in Section 5. 



2 The classification rule 
2.1 Distance-based classifiers 

We consider the problem of constructing a quantile distance-based discriminant rule for 
classifying new observations into one of g populations or classes. Without loss of generality 
we discuss the problem for g = 2. Generalization for g > 2 is straightforward. 
Let Ho and Hi b e two populations with probability dens i ties Pq and Pi on TZ^. Distance 
based classifiers ( Jornste 3, 12OO4J : iTibshirani et al.l . l2003l : iHall et al.l . l2009h assign a new 
data value z = {zi,. . . ,Zp) to the population from which it has lowest distance. More 
specifically, the decision rule allocates z to Hq if 



j]M^„F,)-d(^,,X,)}>0, 



where X = {Xi, . . . , Xp} and Y = {Yi, . . . , Yp} are p-variate random variables from 
populations Hq and Hi and d{-) denotes a specific distance measure. Expression ([1]) rep- 
resents a rather general discriminant rule fo r mulat ion that includes centroid classifiers 
( Tibshirani et al. . 20021. 2003 ; Wang and Zhu . 2007 ). the recent component-wise median- 



based classifiers (iHall et al 



20091 ). and other variants by differently specifying the dis- 



tance measure d{-). On the other hand, summing up component-wise differences means 



2 



that correlation between variables is not taken into account. If p is small and there are 
many observations, this is rather restrictive. However, if p is large and the number of 
observations is rather low, it can be effective to avoid overfitting. By considering the 
Euchdean distance between Zj and the expectations of Xj and Yj, the component- wise 
centroid classifier assigns z to IIo if 

j^ii^^ - m)f - - E{xm > 0, (2) 

and to III otherwise. By taking the Li (Manhattan)-distance between zj and the medians 
of Xj and Yj the component-wise median-based classification rule can be defined as 

p 

- med(y,)| - \z, - med(X,)|} > 0. (3) 

i=i 

Note that in realistic situations neither Pq and Pi nor their moments are known. We 
rather observe two sets xi, . . . , x„g and y^, . . . , from Hq and Hi; they represent the 
training data samples from which the desirable moments must be inferred. For instance, 
the sample version of the centroid classifier assigns z to Hq if 

^{(z,-y,r-(z,-a:,)n>0, (4) 

where yj and Xj denote the jth component of the sample mean vectors. Analogously, the 
sample version of the disc r imina nt rule ([3]) requires computing the empirical component- 



wise medians. iHall et al.l (120091 ) stated that median classifiers are more robust against 
heavy tails of the data distribution than centroid classifiers, thanks to the metric Li 
instead of L2, and they provided a formal proof of the fact that asymptotically the correct 
decision is made by the rule with probability one, if the dimension as well as the numbers 
of observations in both classes tend to infinity under some further assumptions. 
The choice of the metric Li, instead of L2, in the median classifier addresses the need 
of consistency between metric and related minimizer moment; in fact, the mean vector 
(centroid) is the statistic that minimizes the sum of L2-distances of points to the cen- 
troid, whereas the median minimizes the sum of the corresponding Li-distances. Hybrid 
alternatives may exist, such as an Li-version of the centroid classifier. However, they 
look convincing from neither a theoretical nor a practical point of view. Not only does a 
hybrid alternative mismatch the relation between metric and related minimizer quantity, 
but it also seems to produce higher misclassification rates in practice (see, for instance. 



Hall et all (120091 )1 



2.2 The quantile classifier 

We introduce the family of the component-wise quantile classifiers that includes the me- 
dian classifier as special case. By definition, the 6^^ quantile of a univariate random 
variable X with probability distribution function Fx, denoted by qx{0), is the solution to 
Qxid) = Fx^{0) = 'mi{x : Fx{x) > 6}, with 9 E [0, 1]. Analogously to the roles of median 



3 



and centroid with respect to the Li- and L2-metric, the 6 quantile of Fx is the value q 
that minimizes the following population distance 

e / \x - q\dFx{x) + {1 - 6) / \x-q\dFx{x). (5) 

J x>q J x<q 

This can be easily proven by observing that ([5]) is minimized for Fx{q) = 0. Given a set 
of observations Xi, X2, . . . , the empirical 6'*^ quantile of X can be found by minimizing 
the sample counterpart of ([5]): 



0^^\xi- q\ 

Xi>q 



Xi<q 



EC 



26')l[^.,<g]) - g| 



(6) 



The metric ([6]) is used to define the component-wise quantile-based new classifier. Given 
two sets of observations from the two populations IIo and IIi, xi, . . . , x„q and y^^, . . . , y^^, 
a new observation z = (zi, . . . , Zp) G is assigned to IIo if 



[{0 + (1 - 2^)l[,^<,,^(,)]) \z, - q,,{e)\ -{e + {l- 2^)1[,^,<,„^(,)]) \z, - qom] > 0, (7) 

i=i 

where qoj{9) and qij{0) are the marginal quantile functions of the two class-distributions 
evaluated at a fixed value of 6. 

For j = 1, . . . ,p and = 0, 1, let $j(z, 0, g) = + (1 - 26')l[2^.<g]) -g| and $a;j(z, 61) = 

+ (1 — 26')l|-^^^^^g^j^ \zj — qkj{9)\. Then, for fixed 9, the classification rule ( 
equivalent to assigning z to Uq if X]^=i "^0^(2, 9) < J2^j=i ^iji'^^ ^'^^ ni otherwise. 



IS 



Remark 1 The applicability of the decision rule ^ to more than g = 2 classes is straight- 
forward. By definition, the quantile classifier rule for allocating an observation z to one of 
g populations Hi, . . . , Hp is to allocate z to the population which gives the lowest quantile 
distance Yl^=i ^kji^, 9), with k = 1, . . . , g . 

Remark 2 Note that for 9 = 0.5 the objective function in ^ (multiplied by 2) is the 
Li-distance between x and the median. Therefore decision rule ([^ coincides with the 
component- wise median classifier when 9 = 0.5. 

Given the two populations, IIq and Hi, the probability of correct classification of the 
quantile classifier is 



TTo / Ir p irfPo(z) + 

5^($i,(z,e)-$o,(z,^^))>0 



TTl / 1 



5^($i,(z,0)-$o,(z,^^))<O 
.i=i 



dPi(z) 



This quantity represents the theoretical rate of correct classification based on the true 
quantiles. This rate can be used to measure the performance of the discriminant rule 



4 



with respect to the chosen value 6 regardless of the sample size (we will later simulate 
such rates based on empirical quantiles, as relevant in real applications). The following 
lemma provides a useful formula to derive the theoretical rate of correct classification as 
function of 6 for p = 1. 

Lemma 1 When p = 1, the probability of correct classification of the quantile classifier 
takes the following simple form. 

- Ifqo{0)<qii9), 

^{e) = 7ToFo{e) + 7r,{i-F^{e)) (9) 
with e = Oqoie) + {1 - e)qi{e). 

- Ifqo{0)>qii9), 

^(^^) = 7riFi(^) + 7ro(l-FoW) (10) 

with e = eqi{e) + {1 - e)qo{e). 

where qo{6) and qi{9) are the true quantiles of the two populations. 
Proof of Lemma [H 

Consider that in the univariate case $0(2;,^) and ^i{z,9) may be rewritten as 

$0(2:, ^) = (1 - 0) (qoio) - z) i[.<,o(e)] + e{z- qo{e)) i[.>go(e)] 

$1(2;, ^) = (1 - 9) {q,{e) - z) t[z<q,(e)] +e{z- qi{9)) 

For a fixed 9, the integral (j8]) can be easily solved by splitting it into four parts according to 
the possible disjoint regions of the domain of Z with respect to qo{d) and qi{0), namely: 
(a) z < min(go(^),gi(^)), (b) go(^) < z < gi(^), (c) gi(^) < z < qo{e) and (d) z > 
max{qo{e),qi{e)). 

If 2; < min(go(6'), gi(^)) the integral becomes 

/mm{qo{e),q-i{e)) 
l[(l-e)(gi(0)-go(e))>O]C?^o(2) 
-00 

/min(qo(e),i}i(6»)) 
^[{l~e)iqi{0)-qo{e))<O]dPl{z) 
-00 

rqo{d) rqi{0) 

= 710 c/Po(^)l[gi(e)>go(e)] + TTl / dPi{z)l[g^^e)<go{e)] 

J ~oo J —00 

= ^oO'^[qi{e)>qo{e)] + 7ri6'l[g,(0)<go(e)]. 
In the second case the integral is 

^5(6') = vTo / l[(i-g)(^g^(8)-^)-e{z-qo(e))>o]dPo{z) 
Jqo{e) 

+ ^1 / '^[{i-e)(qi(e)-z)-e{z-qo(e))<o]dPi{z) 
Jqoie) 
r.6qo{e)+{i-e)q^{e) 

= TTo / rfPo(^)l[9oW<gi(e)] 

Jqo{e) 

+ TTi / (iPi(2;)l[g„(0)<gj(e)]. 
Jeqo{e)+{i-0)qi{e) 



Similarly, for the cases (c) and (d) the integrals are 

^c(^) = / dPo{z)l[g,^e)<qo{e)] 
J eqi(e)+{i-e)qo{e) 

l-eqi{e)+{i-e)qo{e) 

+ vTi / dPi(z)l[g,(e)<go(0)], 
Jqiie) 

and 

"^diO) = 7ro(l - 6')l[go(e)>g,(0)] + 7ri(l - 6')l[go(e)<gi(e)]- 

Now, when qo{0) < qi{0), "^{0) is the sum of "^ai^), and "^di^) corresponding to 

disjoint domain regions of Z: 

/.e<?o(e)+(i-e)gi(e) 

^{9) = no9 + no dPo{z)+ni dPi{z) + ni{l - 9) 

Jqoie) J eqo{e)+{i-e)qi{e) 

= no9 + 7roFo(^go(^^) + (1 - 9)qi{9)) - no9 + ni9 

-TT,F,{9qo{9) + (1 - 9)q,i9)) + ^,{1 - 9) 

= 7ToFoi9) + mil -Fi{9)). 

Analogously, when qo{9) > qi{9), "^{9) is the sum of \&a(^), ^c(^) and '^d{9) from which: 

Moid) .eqi{e)+{i-e)qo{e) 

^(^) = vri^ + vro/ dP^{z)+Tii dPi{z) + tt^{1 - 9) 

J eqi{e)+{i-e)qo{e) Jqi{0) 

= 7riFi(^) + 7ro(l-Fo(^)). 

Lemma [1] provides a direct formula to compute the probability of correct classification - 
analytically or numerically - for given values of 9. Suppose the two populations IIo and 
III have exponential distributions but differ for a location shift c: X ~ Pq = Exp{X) 
and F ~ Pi = Exp{X) + c, with c > 0. Then Fq{x) = 1 — exp(— Ax) and Pi(y) = 
1 — exp(— A(?/ — c)). Since the probability distribution functions of the exponentials can be 
expressed in closed form, the two quantile functions can be analytically derived by solving 
Fq{x)~^ and Fi{y)~^, from which go(^) = — ^2^^^ and qi{9) = -)_ c, respectively. 

Since c > 0, we have qo{9) < qi{9) ^9 G [0, 1]. By applying ([9]), we get the rates of correct 
classification of the quantile classifier for two (varying-location) exponential distributions 
function of 9: 

vl/(^) = 7ro-(l-^)e^"^(7roe-^"-7ri). 

Figure [U (second panel of third row) shows the theoretical misclassification rates, 1 — \l/(5), 
of two exponential populations with A = 1, c = 0.5 and ttq = tti = 0.5. It is interest- 
ing to note that the minimum misclassification rate can be obtained for 9 approaching 
zero. This particular choice for 9 is related to the high level of skewness of the expo- 
nential distribution. To make this clearer, we also considered further scenarios, namely 
two location-shifted Gaussians, A/'(0, 1) and A/'(l, 1), and two location-shifted chi-squared 
distributions with 5 degrees of freedom and shift c = 2 (first and second rows of Figure 
[T]). The theoretical misclassification rates, 1 — '^{9), can be easily obtained numerically. 
In the Gaussian scenario the minimum value of 1 — ^'(6') is obtained for 9 = 0.5. This 
is not surprising because of the symmetric shape of the Gaussian. But more asymmetric 



6 











Figure 1: Theoretical misclassification rates 1 — for four different scenarios. First 
row: probability density functions of two location-shifted Gaussians and corresponding 
misclassification function of 9. Second row: two location-shifted chi-squared distributions. 
Third row: two location-shifted exponentials. Last row: a Gaussian vs a chi-squared 
distribution. 



7 



distributions (second and third rows in Figure [T]) tend to yield an optimum 6 far away 
from the midpoint 0.5, with positive skewness normally associated with the optimum be- 
ing below 0.5 and negative skewness with an optimum above 0.5 (obviously, if skewness 
is reversed by multiplying a random variable by -1, the resulting optimal 6 will be one 
minus the original optimum). This indicates that the best 9 for one problem is not the 
best for another, and this choice is of crucial importance. For example, in the second case, 
the theoretical quantile function is minimized for 9 = 0.236. The fourth row of Figured] 
shows the classification problem with two differently distributed populations, a Gaussian 
distribution with parameters 5 and 1 and a chi-squared distribution with 4 degrees of 
freedom. The optimal quantile classifier corresponds to 9 = 0.162. 

Figure |2] shows the estimated misclassification rates obtained in the four scenarios by a 
simulation study with sample sizes of training set and test set equal to 500. The plotted 
line is the empirical curve of the misclassification rate obtained in the test set for different 
values of 9. It approximates the theoretical one well. The horizontal lines indicate the 
misclassification rates obtained by the centroid classifier, the median classifier and quantile 
classifier corresponding to the optimal value 9 chosen in the training set. 
Unfortunately, Lemma [1] cannot easily be extended to the multivariate setting, unless 
some very restrictive conditions are assumed regarding independence of the variables and 
strict rules about the ranking of the p different quantiles qkj{9) j = 1, . . . ,p within each 
population. 

2.3 The empirically optimal quantile classifier 

In real applications the problem of the choice of the quantile value in the family of possible 
quantile classifiers can be addressed by selecting the optimum 9 based on misclassification 
rates in the training sample. This leads to the definition of the empirically optimal quantile 
classifier. 

First, we introduce some notation. Let (Zi, Ci), (Z2, C2), ... be i.i.d. x {0, l}-valued 
RV. Let Zi be distributed according to a 2-component mixture of distributions Pq = 
£(Zi|Ci = 0) and Pi = C{Zi\Ci = 1). Let ttq = P{Ci = 0}, vri = 1 - ttq. Let 
Poi, • • • , Pop denote the marginal distributions of Pq, analogously Pn, . . . , Pip. 
For arbitrarily small < r < | define T = [r, 1 — r]. For 6* G [0, 1], j = 1, . . . ,p, k = 0,1 
denote qkj{9) the ^-quantile of Pkj- For given (Zi, Ci), . . . , (Z^, C„) let qkjn{9) be the 
empirical ^-quantile for the subsample defined hj Ci = k, i = 1, . . . ,n. 



For j = A; = 0, 1, z = {zi,...,Zp) e R^, let <I>j(z,^,g) = {9 + {1 - 29)l[zj < q]) \zj- 



q\ (in abuse of notation, assumption B2 of Theorem [2] will apply $j to infinite-dimensional 
z). $fcj(z,6') is used for ^j{z,9,qkj{9)). ^kjn{2,9) is used for ^j{z,9,qkjn{9)). 
The empirically optimal quantile classifier is defined by assigning Z to Uq if 



where 9n = argmax \l/„(6') is the estimated optimal 9 from (Zi,Ci), . . . , (Z„,C„) (if the 
6»eT 

argmax is not unique, any maximiser can be chosen), and the observed rate of correct 



p 




(11) 



8 



o 
■5 



d 



CO 

d 



centroid 

- - median 

— best quantile 



0.0 



0.2 



0.4 



0.6 



0.8 



1.0 



.0 
■(3 



o 

CO 

d 



o 
d 





centroid 




-- median 




— best quantile 







r 
0.0 



0.2 



0.4 



0.6 



0.8 



1.0 



o 
15 



CO 

d 



d 



CO 

d 



centroid 

- - median 

— best quantile 




0.0 



0.2 



o 



CO 

d 



LO 

d 



d 



CO 

d 



centroid 
-- median 
— best quantile 



0.4 



0.6 



0.8 



1.0 




Figure 2: Misclassification rates obtained in the test set of a simulation study. For com- 
parative purposes, the horizontal lines indicate the misclassification rates of the median 
classifier, of the centroid classifier and of the optimal quantile classifier in the training set. 



9 



classification in data (zi, ci), . . . , (z„, c„) is 



1 



El 



n + 



n 



^(<l'lj„(Zi,e) - %jn{Zi,9)) > 



i: Ci=0 



\ 



\ 



^($l,„(Zi,0)-$O,n(Zi,^)) <0 



«: Ci = l 



Li=i 



Note that we look for the optimal value of 6' in T, a closed interval not containing zero. 
In practice, a small nonzero r needs to be chosen, and "^ni^^) is evaluated on a grid of 
equi-spaced values between r and 1 — r. T will in practice depend on the number of 
observations, r should be chosen small but large enough that there is still a certain 
amount of information to estimate the r-quantile. 

The asymptotic probability of correct classification of the quantile classifier is defined in 
([H]). Let 6 = argmax^'(^) be the optimal 6 regarding the true model. 

Remark 3 The empirically optimal quantile classifier is based on finding a single 9 that 
is optimal looking at all variables simultaneously. One could wonder whether it would be 
better to choose different 6-values for each variable. Different variables may show different 
amounts of skewness or skewness in different directions, and a single 9 may not do them 
all justice at the same time. However, choosing different 6-values for different variables 
is tricky. We have tried choosing variable-wise 6-values by looking at misclassification 
rates obtained from looking at p classification problems, each based on a single variable, 
and then we used the resulting variable-wise 6-values for a classification rule incorporating 
all variables. This yielded clearly worse results than selecting a single 6 by looking at all 
variables together, because the misclassification rates based on a single variable are not 
very informative for the misclassification result based on all variables simultaneously. A 
more promising approach would be to iteratively choose variable-wise 6-values by looking 
at the empirical misclassification rates based on all variables, but this requires much more 
computational effort and could easily lead to overfitting, particularly with many variables 
and not so many observations. 

According to Remark [3l a single 6 quantile for all variables implies the choice of a com- 
promise value that optimally discriminates between the classes. As previously observed 
in the univariate setting, 6 will strongly depend on the skewness of the involved distribu- 
tions. In practice, a set of p > 1 measurements could be skewed in different directions; 
as a consequence, 6 could be a poor compromise averaging out the influences from the 
different variables. In order to overcome this problem, we recommend to change the di- 
rection of skewness of variables by applying sign changes in order to unify the direction 
of skewness. 

More specifically, compute a skewness measure separately for each variable, such as the 
conventional third standardized empirical moment or, alternatively, a measure from the 



10 



family of the robust quantile-based quantities (IHinkley 



19751): 



_ F-\u) + F-\l -u)- 2F-\l/2) 
F-^{u) - F-\l - u) 

where F denotes the marginal cumulative distribution function and u a fixed value in the 
interval [0.5,1]. When m = 3/4 the previous expression corresponds to Galton's measure 
of skewness, for -u = 0.1 it corresponds to the less robust Kelley's measure of skewness. 
Evaluate the amount of skewness of each variable separately within classes, in order to 
avoid overall masking effects due to unbalanced populations, and then summarize by 
averaging all the within-class measures with equal weights. The signs of variables with 
negative skewness are then changed, so that finally the variables used for the quantile 
estimator all have the same (positive) direction of skewness. 

Another computational detail is that in case of a tie (i.e., equal training set misclassifica- 
tion rates for different values of 6, which can easily happen for data sets with small n), 
we recommend to fit a square polynomial to the misclassification rate as function of 6 and 
to choose the optimum 6 according to this fit out of the empirically optimal ones. 
In the next section, we will present some theoretical properties of the proposed classifier. 



3 Consistency of the empirically optimal quantile clas- 
sifier 

The theory needs the following assumptions: 

Al For all j = 1, . . . ,p, k = 0,1 : qkj is a continuous function of 6 E T. 
A2 For all 6 G T, P |e;=i($ij(Z, ^^) - %j{Z,e)) = o} = 0. 

If Al and A2 are not fulfilled, there may be ambiguities regarding the optimal quantile or 
the classification of a set of points with nonzero probability. In case of violation of A2, the 
problem caused by this will affect a subset of the data space with at most the probability 
given in A2. Al will probably only affect consistency if violation happens around the 
optimal 9, and probably only weakly so if the discontinuity is small. 

Theorem 1 Assume Al and A2. Then, for any e > 0, 

lim P{\^{e) -^{6^)1 > e} = 0. 

n— >oo 

This means that for n — )■ oo the optimal true correct classification probability equals the 
true one corresponding to the empirically optimal On, i.e., the 6 chosen for the quantile 
classifier, which is therefore asymptotically optimal (and therefore at least as good as |, 
which defines the median classifier). 

Remark 4 Theorem 1 is based on g = 2 and ignores the skewness adjustment proposed 
above. This is for convenience of the proof only. Arguments carry over to g > 2 in a 
straightforward manner. Skewness adjustment requires that the skewness of all variables is 
estimated correctly with probability 1 for large enough n, which does not require additional 
assumptions if the skewness statistic is quantile-based, but will need a moment condition 
for classical skewness. 



11 



Proof of Theorem [H 

|<l>,(z,^^i,gi)-<l>,-(z, 02,^2)1 < \zj\\d2-ei\+A\q2-qi\ (12) 

is proved below as Lemma [2] for j = 1, . . . 6'j-quantiles qi, i = 1,2. Together with Al, 
this implies the continuity of \E', because for given z, (^kj is a continuous function of 6, 
and the dominated convergence theorem makes the integrals of the indicator functions 
converge for 6'„ — )■ 6'. 

The proof of Theorem [1] is now based on 

l^(^) - < l^(^) - + - ^„(e„)| + |^„,(^^„) - (13) 

In order to show that all three terms on the right side are asymptotically small, the 
following result is proved below as Lemma [31 

Ve>0: lim p|sup|^„(e)-^(^)| >el =0. (14) 

(fT4|) forces the first and third term on the right side of (fT3!) to converge to zero in proba- 
bility. Consider now the second term. By definition, 

^n(^n) > ^n(^~), ^(^) > ^(^n)- 

Using ([HI) again, for large n both \^n{6) - ^(6^)1 and [^n{On) - ^{Gn)\ will be arbitrarily 
small with arbitrarily large probability, and this makes — ^n{dn) \ arbitrarily small, 

too. Altogether, this proves the theorem. 

Lemma 2 [W^) holds for j G {1, . . . ,p}, 9i, O2 G (0, 1), gi, q2 G R, assuming 61 < 62 ^ 
qi < q2 and analogously for (as holds if qk is a quantile belonging to 6^). 

Proof of Lemma [2) assume w.l.o.g. qi < q2, < Oi < 62 < 1. Consider Zj < qi, qi < 

< Q2, g2 < Zj separately; first Zj < qi- By definition, 

|$,(z,ei,gi) -<l>,(z,^2,g2)| = \{1 - e^){q^ - z,) - {l-e2){q2-z,)\ 

= \{qi- q2) + (^1 + ^2)(g2 - gi) - ^ig2 + ^2gi + %(^i - ^2)! 

< 1^2 - gil + 1^1 + 6'2||g2 - gil + 6'2|g2 - gil + \zj{9i - 6*2)1 < \zj\\92 - 9i\ +4|g2 - gi|. 

For gi < Zj < g2: 

\<l>j{z,9i,qi) -<!>j{z,92,q2)\ = \9i{zj - q^) - {1 - 92){q2 - Zj)\ < |g2 -gi|. 
For g2 < Zj-. 

\<!>j{z, ^1, gi) - <^j{z, 92, g2)| = 1^1(2;^ - gi) - 92{zj - g2)| 
and ([T2|) follows along the lines of the first case. 

Lemma 3 holds under the conditions of TheoremUi 



12 



Proof of Lemma [3j Suppose (HM were wrong. This means that there exist e > 0, S > 0, 
a subsequence M of (1, 2, . . .) and (6'i^)meAf such that 



Vm e M : P{|vl/„(C) - > > 5. 



(15) 



W.Lo.g. (because {9m)m€M G T is bounded and at least a subsequence has a hmit) 

there exists 9* = hmm-i>oo 9^- 

Consider 



1^ 



^(oi<i^™(c)-^m(m + i^. 



+ 1^( 



(16) 



Continuity of \& forces the third term of the right side of f|T6|) to converge to 0. 
Regarding the second term, define a version of '^n using the true quantiles instead of the 
empirical ones: 



= ^ ( E 1 



\i: Ci=0 



5^($,(Z„ 9, qr,m - $,(Z„ ^, qo,m) > 



+ 



E 1 

i: Ci = l 



5^($,(Z„^^,gi,(^^)) - $,(Z„^^,gi,(^))) < 



Consider 



1^. 



- ^fr)i < 1^. 



-*i(r)| + |^: 



Because of the strong law of large numbers, limm_j,oo |^m(^*) — ^(^*)| = a.s. 

For given z and 9, $j is continuous in q. Furthermore quantiles are strongly consistent, 

and therefore will enforce lim^^oo |^m(6'*) - '^*m{9*)\ = a.s. 

Now consider the first term of the right side of (fT6|) . 

\qkjrni9*J-qkU9 *)\<\q ,,U9*)-q,^^^^^ (17) 

From Theorem 3 in Mason ( 19821 ). which assumes Al, limm^oo sup^gj. \Qkji9) — qkjni9)\ = 
a.s.. This enforces the first two terms on the left side of (1171) to converge to zero a.s.. The 
last term converges to zero because of Al. Therefore 

\qkjm{9*J - qkjm{9*)\ ^ a.s. (18) 

Let D49,z) = E-=i('^'i.n(z,^) - $o,n(z,^)), D{9,z) = E?=i('^'y (z, - '^oA^,^))- ^oi 
e > define 

Z, = {z: |D(r,z)|>e}n|z: J] < ^| , 



so that 
1^ 



= E [^Dm{0*m,^^) > 0) " 1(0^9' 

\i: Ci=0, Zi^Zj 

[HDU9*m, z.) < 0) - i(D„(r , z,) < 0)] + 

i: Ci=l, Zi^Z, 

z.) > 0) - i(D„(r , z,) > 0)] + 



Zi) > 0)1 + 



i: Ci=0, Zi€Z, 



i. Ci 



J2 mDU9*r,,,Z,)<0)-l{DU9*,Z,)<0)]] 

=1, Zi€Z, / 



13 



Now for large m and arbitrarily small S > 0, 



I Yl IHDrniOL z.) > 0) - HDUO*, z,) > 0)] + 

\i: Ci=0, Z.^Zc 

J2 [^DUO*m,^^) < 0) - HDUO*,Z,) < 0)] J 

= 1, Zi^Z, / 



< 1 - P(Z,) + 5 a.s. 



i: C; 

Furthermore, by (fT2|) 



\DUO*m, Z.) - DUO*, Z.)| < Yl (2|^.IIC - ^1 + 8\lkrmiO*J - qkUOll) ■ 

Because \e;^ - e*\ ^ 0, hj ^ and |^il < 7 for Z e Z„ this difference becomes 

arbitrarily small a.s. for large enough m, and therefore for Zj G Z^, 0^(9^, Zi) and 
Dfn{0*, Zj) will for large enough m be on the same side of zero and their "> 0" and 
"< 0"-indicators will therefore be the same, a.s. 

For e \ 0, A2 enforces P{Z^ — 1. This forces the first term on the right side of f|T6l) to 
zero for large m, a.s., in contradiction to f|T5|) . which in turn proves ^ 



4 A result for p ^ oo 



Theorem [T] refers to — )■ 00 for fixed finite p. In many modern applications, p is so 
large and often larger than n that results for p 00 seem more appealing, although 
such results require n — )■ 00 as well and it is not entirely clear whether they give a better 
justification of a method for applications with given n and p. In any case they contribute 
to the exp l oratio n of a classifier's properties. 

Hall et al.l (120091 ) prove under some conditions that the misclassification probability of the 
median classifier converges to zero for n,p — ?■ 00. Unfortunately we were not able to prove 
a result ensuring that the quantile classifier is, asymptotically, always at least as good and 
sometimes better than the median classifier, as one would hope. Analyzing the proof in 
Hall et al.l ( 120091 ). it can be seen that it adapts in a more or less straightforward manner 
to classifiers based on any fixed quantile other than the median. Despite the fact that one 
may expect the quantile classifier to do at least as good a job (because it incorporates 

finding the optimal quantile), this classifier is more difficult to handle theoretical ly. 

We present a result that requires stronger assumptions than those in iHall et al. (I2OO9I), 



namel y considering them uniformly for a range of quantiles. The arguments in iHall et al. 



(120091 ) then ensure that the zero misclassification result carries over to classifiers based on 
whatever quantile selection rule is chosen, obviously including selecti ng the empiricall y 
optimal one. We restrict ourselves to applying this idea to Theorem 1 in lHall et al.l ( l2009l ). 
Let again T = [r, 1 — r] for arbitrarily small < r < |. Let U = (?7i, f/2, • • ■) denote an 
infinite sequence of random variables, each f/j with uniquely defined 6'-quantiles qi{d) for all 
9 E T and median zero. For infinite sequences of constants (z^xi.^' ^X2,^y ■ • ^Y2 1' 

assume that for each p, the p-vectors Xi, . . . ,Xm are identically distributed as {^xl^ + 
f/i, . . . , z^xp.i + Up)^ the p-vectors Yi, . . . , Y„ are identically distributed as (z^yi i + 



Ui, . . . , z/y 1+Up). Define for i > 1 the quantiles uxi^e = J^xi i+Qii^), i^Yi,9 



14 



Let C be a [0, l]-valued RV and assume Z to be distributed as Xi if C = and as Yi if 

C = 1, and Xi, . . . , X^, Yi, . . . , Y„ and (Z, C) as totally independent. 

Assumptions: 

Bl limx^^snp,^, E{\Uk\mUk\ > A)} = 0. 
B2 For each c > : 

inf inf inf [E$fc(U, 9, qk{e) + x) - ^<l>fe(U, 6, qk{e))] > 0- 

k>l \x\>c d£T 

B3 For each e > : 

inf inf [min{^^ - P[Uk <qkie)-e],e- P[Uk > QkiO) + e]}] > 0. 

B4 With B denoting the class of Borel subsets of the real line, 

lim sup sup \PiUk, e Bl, Uk, e B2) - P{Uk, E Br)P{Uu, G ^2)1 = 0. 



B5 The differences \i'xk,e — ^Yk,e\ are uniformly bounded. 

B6 For sufficiently small e > 0, the proportion of values k G for which \h'xk,e — 

^Yke\ > e 76* G T is bounded away from zero as p diverges. 



The assumptions Bl and B4 are identica l to (4.1) a n d (4.4 ) in lHall et al.l (120091). B2, B3, 
B5 and B6 are (4.2), (4.3), (4.5), (4.6) in lHall et all fl2009l ) enforced to hold uniformly for 
all 6* G T. B4 and B6 enforce a steady flow of relevant information to be added by the data 
for increasing p. Note that both conditions together mean that at any stage an infinite 
amount of relevant information in new variables independent of what is already known 
is still waiting to be discovered. This may look unrealistic but such a thing is essentially 
needed for any theory for any method based on p — )■ 00 faster than n and m. Bl and 
B5 are needed, given B6, to prevent classification from being dominated by a single or a 
finite number of vari ables, B2 and B3 are about uniform continuity and well-definedness 
of the quantiles. See iHall et al.l (l2009l ) for further discussion of these assumptions. 
Let R : IN 1— )■ T any quantile selection rule. Let 7lm,n,i, i G IN be the sequence of {0, 1}- 
valued i?(i)-quantile classifiers computed from [(Xi, 0), . . . , (X^, 0), (Yi, 1), . . . , (Y„, 1)]. 



Theorem 2 Assume B1-B6 and that both n and m diverge as p — ?■ 00. Then, with 
probability converging to 1 as p increases, the classifier 7lm,n,p raakes the correct decision, 
i.e., 

p{7^„,„,p(z) = i|c = 0} + p{7^„,„,p(z) = o\c = 1} ^ o. 



Hall et al.1 fl2009l ). B2, B3, B5 and B6 



Proof of Theorem [2j In the proof of Theorem 1 m 
enforce every statement to hold uniformly for 6* G T, after definitions have been adapted 
to general quantile classifiers (i.e., Wk, Dk, D{Z), Sx,d{Z),]C^ and dk need to be defined 
as functions of 6 with quantiles replacing medians, replacing the absolute value where 
B2 is appl i ed an d qk{0) replacing zero where B3 is applied). Equations (A.1)-(A.6) in 
Hall et al.l (120091 ) then hold uniformly over T. 



15 



Remark 5 Again, generalization to g > 2 should be straightforward. Incorporation of 
the skewness adjustment may be more tricky but chances are that the statement still holds 
in that case (apart from potential moment problems in case of the traditional skewness 
statistic). 



Similar arguments should be possible regarding Theorem 2 in lHall et al. 
different assumptions. 



which has 



5 Numerical results 
5.1 Simulation study 

We evaluated the performance of the component quantile classifier by a large simulation 
study comprising several simulated experiments with the aim of assessing the effect of 
the following factors: sample size, dimensionality, shape of the class-distributions and 
different level of relevance of the variables for classification. We generated p vectors from 
g = 2 populations in three different main scenarios. In the first scenario we considered 
symmetric Student's i-distributed variables Wj (j = 1, . . . ,p) with 3 degrees of freedom. 
We simulated two location-shifted populations from Wj as Xj = Wj and Yj = Wj + 0.5. 
In the second setting we tested the behavior of the classifiers in highly skewed data, by 
generating identically distributed vectors, Wj with j = 1, . . . ,p, from a multivariate Gaus- 
sian distribution, and transforming them using the exponential function, Xj = exp{Wj) 
and Yj = exp{Wj) + 0.2. In the third scenario we considered differing distributions for 
the p variables. More specifically, we first generated Wj from a multivariate Gaussian 
distribution and then we split p in 5 balanced blocks of different transformations: 



1. 


X, 


= Wj and Yj = Wj 


+ 0.2 


2. 


X, 


= exp{Wj) and Yj - 


= exp{Wj) + 0.2 


3. 


X, 


= log{\Wj\) and Yj 


= log{\W,\) + 0.2 


4. 


X, 


= Wf and Yj = W] 


+ 0.2 


5. 


X, 


= ^\Wj\ and Yj = 


V\W,\+0.2 



For each of the three scenarios we evaluated the combination of several factors: indepen- 
dent or dependent variables, p = 50, 100, 500, n = 50, 100, 500, and different percentage of 
relevant variables for classification (100%, 50%, and 10%) for a total of 3x2x3x3x3 = 162 
different settings. The dependence structure between the variables has been introduced 
by generating correlated variables Wj {j = 1, . . . ,p) from a Gaussian distribution with 
equicorrelated covariance matrix [p = 0.2). The irrelevant 'noise' variables have been gen- 
erated by taking the same base distribution as for the informative variables and leaving 
out the additive constant. 

For each setting we simulated 100 data sets as training sets and 100 as test sets. The 
pairs of data sets were split into the two balanced populations with sample size n/2. 
The component-wise quantile based classifier has been implemented in the R package 
quantileDA, (the package will be available on GRAN R homepage soon). Data have 
been preprocessed according to the skewness correction discussed in Section 2.3 using the 
conventional skewness measure and the Galton's robust version. In each setting we have 



16 



evaluated the classifier on a grid of equi-spaced values ^ in T = [r, 1 — r] with r = 0.02. 
In general, r could be tuned to the sample size n as, say, r = 5/n. The optimal has 
been chosen in each training set. In order to see which ^-values were chosen depending 
on the model setup, an average value of these has been computed across all the 100 data 
sets. The mean of the misclassification rates and the standard error of these means were 
estimated from the classification results in the replicated test sets. 

Tables [T]l6] show the obtained results of the quantile classifiers with data preprocessed 
according to the Galton and the Skewness corrections (QCG, QCS). The tables show the 
average misclassification errors and the average of the optimal 9 values across all the 100 
data sets in each considered setting. In brackets standard errors have been reported. 
We compared the quantile classifier misclassification rates with the ones obtained by nine 
other classifiers: the component-wise centroid and median classifier (CC, MC), Fisher's 
linear discriminant analysis (LDA), t he fc-nearest-ne i ghbor classifier (k-NN: ICover and Hart 
( 1967 )) , the naive Bayes cl assifie r ( Hand and Yu . 200ll ). the support vector machine 
(SVM; ICortes and VapnikI (Il995h : IWang et all (l2008l )). the nearest-shrunken centroid 
method ( Tibshirani et al.l 2002h. penalized l o gistic regression ( Park and Hastie . 2008 ) 
and classification trees (rpart; Breiman et al. ( 1984[ )). We used the R package MASS to 
implement Fisher's LDA, the library CLASS for k-NN with = 5, the library elOTl for 
the naive Bayes classifier and SVM (Support Vector Machine) with the default settings, 
the package pamr for the nearest- shrunken centroid with threshold set to 1, the package 
stepPlr for penalized logistic regression wit regularization parameter A = 1, and the 
package rpart for implementing the classification trees. 

For all methods, the misclassification rates decrease as the sample size increases. With 
reference to the quantile classifier the larger the sample size is, the more consistent the 
choice of the optimal 9 appears and consequently the discriminative power of the method 
increases. Not surprisingly, the classification performance worsens as the number of irrel- 
evant variables increases. For fixed sample size and percentage of relevant variables, the 
methods seem to perform better as p increases, in almost all settings. 
To summarize and compare results of the different classifiers, we have computed the rela- 
tive performance of each classifier with respect to the Galton quantile classifier misclassifi- 
cation rates taken as baseline. More specifically, we have transformed the misclassification 
rates of each classifier as error rate minus baseline error rate divided by the average er- 
ror rate in the given setting. The distribution of these rescaled results (aggregated over 
the different choices of p, n, dependence /independence and the percentage of relevant 
variables) is represented in the boxplots of Figure El 

Results indicate that component-wise quantile classifier performs very well in most situa- 
tions compared to the other classifiers. The skewness correction according to the conven- 
tional third standardized moment seems to produce a slightly better classification perfor- 
mance in the asymmetric setups. However, the Galton skewness correction is preferable 
when analyzing real data more sensitive to outliers, as it will be shown in the next section. 
In the scenarios with symmetric variables, the performance of the quantile classifiers is 
similar to the one of centroid and the median classifier, and this is consistent with the 
chosen optimal value of 9, which is on average close to the midpoint 0.5. In the scenarios 
with asymmetric variables and differing distributions of variables, the quantile classifier 
outperforms the other methods. In fact, in these cases, the optimal 9 is is almost always 
far below the midpoint, provided that the sample size and the number of relevant variables 
are sufficiently high. 



17 



Identically distributed symmetric variables 



1 1 — 

Pen. log. 

regression rpart 



Quantile 

(Skewness) Centroid 



Median 



LDA 



knn 



nBayes 



SVM 



NSC 



Identically distributed asymmetric variables 



1 1 

Quantile 

(Skewness) Centroid 



1 1 — 

Pen. log. 

regression rpart 



Median 



nBayes 



SVM 



NSC 



Differing distributions of variables 



1 1 

Quantile 

(Skewness) Centroid 



— I 1 1 — 

knn nBayes SVM 



— I 1 1 

Pen. log. 
NSC regression rpart 



Median 



Figure 3: Relative performance of the classifiers with respect to the quantile classifier 
with Galton skewness correction taken as baseline. 



18 



Table 1: Simulation study: independent identically distributed symmetric variables. Mis- 
classification rates (with standard errors in brackets) for different methods. Rows 2 and 
4 contain the mean of the chosen values of 9 in the training sets. 

n = 50 



p = 50 p = 100 p = 500 









100% 






50% 




10% 






100% 




50% 






10% 






100% 






50% 




10% 


QCG 





17 


(0.06) 





28 


(0.06) 


0.42 


(0.06) 





10 


(0.05) 


0.20 


(0.07) 





41 


(0.07) 





02 


(0.02) 





06 


(0.04) 


0.32 


(0.08) 


e Galton 





46 


(0.18) 





44 


(0.18) 


0.44 


(0.17) 





46 


(0.13) 


0.44 


(0.14) 





43 


(0.11) 





48 


(0.03) 





48 


(0.02) 


0.48 


(0.03) 


QCS 





17 


(0.08) 





28 


(0.07) 


0.42 


(0.06) 





10 


(0.06) 


0.21 


(0.07) 





41 


(0.06) 





02 


(0.02) 





06 


(0.03) 


0.31 


(0.07) 


9 Skewn. 





49 


(0.10) 





41 


(0.18) 


0.43 


(0.19) 





40 


(0.12) 


0.43 


(0.15) 





44 


(0.15) 





43 


(0.06) 





43 


(0.03) 


0.44 


(0.03) 


CC 





16 


(0.08) 





27 


(0.07) 


0.43 


(0.05) 





10 


(0.07) 


0.22 


(0.10) 





42 


(0.06) 





04 


(0.08) 





13 


(0.14) 


0.37 


(0.08) 


MC 





17 


(0.05) 





27 


(0.06) 


0.42 


(0.05) 





10 


(0.05) 


0.19 


(0.06) 





42 


(0.05) 





02 


(0.02) 





06 


(0.04) 


0.32 


(0.07) 


LDA 





38 


(0.07) 





41 


(0.07) 


0.43 


(0.05) 





23 


(0.07) 


0.30 


(0.07) 





43 


(0.05) 





26 


(0.09) 





36 


(0.08) 


0.43 


(0.05) 


knn 





19 


(0.08) 





31 


(0.08) 


0.44 


(0.05) 





14 


(0.08) 


0.26 


(0.09) 





44 


(0.05) 





08 


(0.12) 





23 


(0.14) 


0.42 


(0.07) 


n-Bayes 





34 


(0.11) 





42 


(0.07) 


0.45 


(0.04) 





31 


(0.14) 


0.40 


(0.09) 





44 


(0.05) 





28 


(0.15) 





36 


(0.13) 


0.44 


(0.05) 


SVM 





15 


(0.05) 





26 


(0.07) 


0.42 


(0.05) 





10 


(0.04) 


0.19 


(0.07) 





42 


(0.06) 





06 


(0.04) 





11 


(0.07) 


0.42 


(0.07) 


NSC 





29 


(0.08) 





36 


(0.08) 


0.42 


(0.06) 





23 


(0.07) 


0.31 


(0.08) 





41 


(0.06) 





10 


(0.06) 





18 


(0.08) 


0.36 


(0.08) 


stopPlr 





14 


(0.05) 





25 


(0.06) 


0.41 


(0.05) 





07 


(0.04) 


0.15 


(0.05) 





39 


(0.06) 





01 


(0.01) 





03 


(0.03) 


0.24 


(0.06) 


ri>aiL 





39 


(0.05) 





11 


(O.Oli) 


0. 11 


(0.05) 





10 


(().()()) 


0.39 


(0.07) 





13 


(0.0.")) 





10 


l_0.()()) 





11 


(().()()) 


0. 12 


(O.lKi) 


























n = 


= 100 


































p = 


= 50 












P = 


= 100 
















p = 


500 












100% 






50% 




10% 






100% 




50% 






10% 






100% 






50% 




10% 


QCG 





15 


(0.04) 





25 


(0.05) 


0.42 


(0.04) 





09 


(0.04) 


0.18 


(0.04) 





40 


(0.05) 





01 


(0.01) 





04 


(0.02) 


0.26 


(0.04) 


e Galton 





43 


(0.15) 





43 


(0.16) 


0.42 


(0.15) 





47 


(0.18) 


0.44 


(0.16) 





44 


(0.13) 





49 


(0.05) 





48 


(0.04) 


0.48 


(0.05) 


QCS 





16 


(0.04) 





25 


(0.04) 


0.42 


(0.04) 





10 


(0.04) 


0.18 


(0.05) 





40 


(0.05) 





01 


(0.01) 





04 


(0.02) 


0.25 


(0.04) 


Skewn. 





44 


(0.17) 





47 


(0.16) 


0.47 


(0.19) 





42 


(0.17) 


0.45 


(0.16) 





47 


(0.14) 





46 


(0.05) 





46 


(0.03) 


0.47 


(0.05) 


CC 





13 


(0.06) 





22 


(0.06) 


0.42 


(0.05) 





07 


(0.03) 


0.16 


(0.06) 





37 


(0.06) 





01 


(0.01) 





04 


(0.05) 


0.30 


(0.08) 


MC 





14 


(0.04) 





23 


(0.05) 


0.41 


(0.05) 





08 


(0.03) 


0.16 


(0.03) 





37 


(0.05) 





01 


(0.01) 





04 


(0.02) 


0.26 


(0.05) 


LDA 





18 


(0.05) 





27 


(0.05) 


0.43 


(0.04) 





35 


(0.08) 


0.39 


(0.06) 





45 


(0.04) 





12 


(0.04) 





22 


(0.05) 


0.42 


(0.05) 


knn 





15 


(0.04) 





26 


(0.06) 


0.43 


(0.05) 





09 


(0.03) 


0.21 


(0.07) 





42 


(0.05) 





02 


(0.02) 





14 


(0.10) 


0.41 


(0.07) 


n-Bayes 





33 


(0.11) 





39 


(0.09) 


0.46 


(0.03) 





29 


(0.14) 


0.39 


(0.09) 





45 


(0.04) 





26 


(0.17) 





33 


(0.15) 


0.45 


(0.04) 


SVM 





12 


(0.04) 





20 


(0.04) 


0.40 


(0.05) 





08 


(0.03) 


0.14 


(0.03) 





37 


(0.06) 





04 


(0.02) 





06 


(0.03) 


0.32 


(0.09) 


NSC 





25 


(0.06) 





30 


(0.05) 


0.41 


(0.05) 





17 


(0.04) 


0.26 


(0.06) 





38 


(0.06) 





04 


(0.03) 





10 


(0.04) 


0.28 


(0.07) 


stepPlr 





13 


(0.04) 





23 


(0.04) 


0.42 


(0.05) 





06 


(0.03) 


0.14 


(0.03) 





36 


(0.05) 





01 


(0.01) 





02 


(0.01) 


0.20 


(0.04) 


rpart 





38 


(0.05) 





40 


(0.05) 


0.44 


(0.04) 





39 


(0.05) 


0.40 


(0.04) 





43 


(0.05) 





38 


(0.05) 





39 


(0.05) 


0.41 


(0.05) 


























n = 


= 500 


































p = 


= 50 












P = 


= 100 
















p = 


500 












100% 






50% 




10% 






100% 




50% 






10% 






100% 






50% 




10% 


QCG 





13 


(0.02) 





20 


(0.02) 


0.38 


(0.02) 





07 


(0.01) 


0.13 


(0.01) 





33 


(0.02) 





01 


(0.01) 





03 


(0.01) 


0.17 


(0.02) 


e Galton 





43 


(0.11) 





43 


(0.10) 


0.42 


(0.10) 





44 


(0.13) 


0.44 


(0.11) 





43 


(0.10) 





48 


(0.20) 





45 


(0.19) 


0.44 


(0.10) 


QCS 





13 


(0.02) 





20 


(0.02) 


0.37 


(0.03) 





07 


(0.01) 


0.13 


(0.01) 





32 


(0.02) 





01 


(0.01) 





03 


(0.01) 


0.17 


(0.02) 


9 Skewn. 





53 


(0.12) 





50 


(0.10) 


0.48 


(0.13) 





48 


(0.13) 


0.50 


(0.12) 





50 


(0.11) 





49 


(0.17) 





51 


(0.17) 


0.49 


(0.13) 


CC 





09 


(0.01) 





17 


(0.02) 


0.36 


(0.03) 





06 


(0.04) 


0.10 


(0.01) 





30 


(0.02) 





01 


(0.00) 





02 


(0.01) 


0.16 


(0.06) 


MC 





12 


(0.02) 





19 


(0.02) 


0.36 


(0.02) 





07 


(0.01) 


0.13 


(0.02) 





32 


(0.02) 





01 


(0.00) 





03 


(0.01) 


0.16 


(0.02) 


LDA 





11 


(0.02) 





19 


(0.02) 


0.36 


(0.03) 





07 


(0.01) 


0.13 


(0.02) 





32 


(0.02) 





34 


(0.06) 





38 


(0.05) 


0.45 


(0.03) 


knn 





11 


(0.01) 





21 


(0.02) 


0.41 


(0.03) 





06 


(0.01) 


0.14 


(0.02) 





38 


(0.03) 





01 


(0.00) 





04 


(0.02) 


0.33 


(0.07) 


n-Bayes 





29 


(0.12) 





36 


(0.09) 


0.47 


(0.03) 





26 


(0.15) 


0.37 


(0.11) 





46 


(0.04) 





21 


(0.17) 





35 


(0.15) 


0.46 


(0.05) 


SVM 





10 


(0.01) 





17 


(0.02) 


0.34 


(0.02) 





06 


(0.01) 


0.11 


(0.01) 





29 


(0.02) 





03 


(0.01) 





03 


(0.01) 


0.13 


(0.02) 


NSC 





12 


(0.02) 





19 


(0.02) 


0.33 


(0.03) 





07 


(0.03) 


0.12 


(0.02) 





28 


(0.02) 





01 


(0.00) 





02 


(0.01) 


0.13 


(0.02) 


StepPlr 





12 


(0.02) 





19 


(0.02) 


0.36 


(0.03) 





07 


(0.01) 


0.11 


(0.02) 





32 


(0.02) 





01 


(0.00) 





02 


(0.01) 


0.11 


(0.02) 


il)ail 





35 


^().02) 





o(i 


(0.02) 


0. 11 


(0.03) 





.3.') 


(0.02) 


().3() 


(0.02) 





10 


(0.03) 





35 


(0.02) 







(0.02) 


0.38 


^().02) 



19 



Table 2: Simulation study: dependent identically distributed symmetric variables. Mis- 
classification rates (with standard errors in brackets) for different methods. Rows 2 and 
4 contain the mean of the chosen values of in the training sets. 

n = 50 



p = 50 p = 100 p = 500 









100% 






50% 




10% 






100% 




50% 






10% 






100% 






50% 




10% 


QCG 





27 


(0.09) 





32 


(0.08) 


0.42 


(0.06) 





24 


(0.07) 


0.27 


(0.07) 





41 


(0.06) 





21 


(0.07) 





21 


(0.06) 


0.35 


(0.08) 


e Galton 





37 


(0.25) 





43 


(0.23) 


0.47 


(0.18) 





39 


(0.30) 


0.42 


(0.23) 





44 


(0.16) 





36 


(0.31) 





42 


(0.19) 


0.46 


(0.09) 


QCS 





27 


(0.08) 





31 


(0.08) 


0.43 


(0.05) 





24 


(0.08) 


0.27 


(0.08) 





41 


(0.06) 





22 


(0.06) 





22 


(0.07) 


0.36 


(0.07) 


9 Skewn. 





34 


(0.25) 





44 


(0.25) 


0.40 


(0.19) 





30 


(0.26) 


0.37 


(0.23) 





41 


(0.14) 





22 


(0.22) 





36 


(0.18) 


0.43 


(0.09) 


CC 





24 


(0.07) 





31 


(0.07) 


0.43 


(0.05) 





21 


(0.07) 


0.27 


(0.08) 





43 


(0.06) 





19 


(0.06) 





23 


(0.09) 


0.40 


(0.08) 


MC 





24 


(0.06) 





29 


(0.06) 


0.42 


(0.05) 





20 


(0.06) 


0.25 


(0.06) 





40 


(0.06) 





18 


(0.05) 





21 


(0.06) 


0.35 


(0.07) 


LDA 





43 


(0.05) 





41 


(0.06) 


0.43 


(0.05) 





32 


(0.07) 


0.34 


(0.08) 





43 


(0.05) 





22 


(0.06) 





33 


(0.07) 


0.43 


(0.05) 


knn 





27 


(0.06) 





33 


(0.08) 


0.43 


(0.05) 





25 


(0.07) 


0.31 


(0.08) 





44 


(0.05) 





24 


(0.08) 





30 


(0.09) 


0.44 


(0.06) 


n-Bayes 





35 


(0.08) 





41 


(0.06) 


0.45 


(0.04) 





34 


(0.10) 


0.40 


(0.08) 





45 


(0.04) 





34 


(0.10) 





37 


(0.09) 


0.44 


(0.05) 


SVM 





24 


(0.06) 





29 


(0.07) 


0.42 


(0.05) 





23 


(0.07) 


0.26 


(0.07) 





42 


(0.06) 





21 


(0.06) 





22 


(0.07) 


0.41 


(0.07) 


NSC 





32 


(0.07) 





37 


(0.07) 


0.43 


(0.06) 





29 


(0.07) 


0.33 


(0.06) 





43 


(0.06) 





22 


(0.06) 





25 


(0.07) 


0.39 


(0.07) 


stopPlr 





28 


(0.07) 





29 


(0.06) 


0.41 


(0.06) 





24 


(0.07) 


0.23 


(0.07) 





38 


(0.07) 





19 


(0.05) 





12 


(0.06) 


0.28 


(0.07) 


ri>aiL 





39 


(0.()()) 





11 


(O.Oli) 


0. 1.3 


^().o5) 





31) 


(().()()) 


0. 11 


(0.07) 





11 


(O.O.j) 





10 


l_0.()()) 





10 


(0.06) 


0. 13 


(0.05) 


























n = 


= 100 


































p -- 


= 50 












p = 


= 100 
















p = 


500 












100% 






50% 




10% 






100% 




50% 






10% 






100% 






50% 




10% 


QCG 





26 


(0.05) 





30 


(0.05) 


0.43 


(0.04) 





23 


(0.06) 


0.26 


(0.06) 





40 


(0.05) 





20 


(0.05) 





21 


(0.06) 


0.31 


(0.06) 


e Galton 





40 


(0.24) 





41 


(0.21) 


0.43 


(0.17) 





41 


(0.30) 


0.42 


(0.22) 





43 


(0.14) 





38 


(0.33) 





36 


(0.25) 


0.46 


(0.12) 


QCS 





26 


(0.05) 





30 


(0.06) 


0.43 


(0.04) 





24 


(0.06) 


0.25 


(0.06) 





40 


(0.04) 





22 


(0.06) 





21 


(0.05) 


0.30 


(0.06) 


d Skewn. 





42 


(0.24) 





47 


(0.23) 


0.47 


(0.21) 





37 


(0.29) 


0.42 


(0.23) 





47 


(0.17) 





35 


(0.34) 





38 


(0.28) 


0.47 


(0.14) 


CC 





21 


(0.04) 





27 


(0.04) 


0.42 


(0.05) 





20 


(0.04) 


0.24 


(0.05) 





39 


(0.05) 





19 


(0.06) 





20 


(0.05) 


0.33 


(0.09) 


MC 





22 


(0.05) 





28 


(0.04) 


0.42 


(0.04) 





20 


(0.04) 


0.23 


(0.04) 





38 


(0.05) 





18 


(0.04) 





19 


(0.04) 


0.30 


(0.06) 


LDA 





32 


(0.05) 





31 


(0.05) 


0.42 


(0.05) 





44 


(0.05) 


0.42 


(0.06) 





45 


(0.04) 





23 


(0.04) 





26 


(0.05) 


0.41 


(0.04) 


knn 





24 


(0.05) 





31 


(0.06) 


0.44 


(0.04) 





22 


(0.04) 


0.27 


(0.05) 





43 


(0.05) 





21 


(0.05) 





24 


(0.07) 


0.42 


(0.06) 


n-Bayes 





35 


(0.09) 





41 


(0.07) 


0.46 


(0.03) 





35 


(0.10) 


0.40 


(0.09) 





46 


(0.03) 





34 


(0.09) 





36 


(0.10) 


0.46 


(0.04) 


SVM 





23 


(0.04) 





26 


(0.05) 


0.40 


(0.05) 





21 


(0.04) 


0.22 


(0.04) 





38 


(0.05) 





20 


(0.04) 





14 


(0.04) 


0.35 


(0.07) 


NSC 





28 


(0.05) 





32 


(0.06) 


0.42 


(0.05) 





24 


(0.05) 


0.29 


(0.06) 





40 


(0.06) 





20 


(0.05) 





22 


(0.04) 


0.30 


(0.06) 


stepPlr 





28 


(0.05) 





27 


(0.05) 


0.41 


(0.05) 





24 


(0.04) 


0.21 


(0.04) 





36 


(0.05) 





20 


(0.04) 





08 


(0.03) 


0.21 


(0.05) 


rpart 





39 


(0.05) 





40 


(0.05) 


0.45 


(0.04) 





38 


(0.05) 


0.40 


(0.05) 





44 


(0.04) 





38 


(0.04) 





40 


(0.05) 


0.43 


(0.04) 


























n = 


= 500 


































p -- 


= 50 












P = 


= 100 
















p = 


500 












100% 






50% 




10% 






100% 




50% 






10% 






100% 






50% 




10% 


QCG 





22 


(0.02) 





25 


(0.02) 


0.38 


(0.03) 





20 


(0.02) 


0.22 


(0.02) 





34 


(0.03) 





18 


(0.02) 





19 


(0.02) 


0.24 


(0.04) 


e Galton 





41 


(0.15) 





43 


(0.11) 


0.44 


(0.12) 





39 


(0.13) 


0.39 


(0.13) 





47 


(0.14) 





41 


(0.25) 





43 


(0.25) 


0.45 


(0.15) 


QCS 





22 


(0.02) 





26 


(0.02) 


0.38 


(0.02) 





20 


(0.02) 


0.23 


(0.02) 





34 


(0.03) 





18 


(0.02) 





19 


(0.02) 


0.24 


(0.04) 


9 Skewn. 





46 


(0.15) 





50 


(0.13) 


0.49 


(0.13) 





49 


(0.19) 


0.48 


(0.18) 





51 


(0.13) 





46 


(0.25) 





46 


(0.28) 


0.49 


(0.18) 


CC 





21 


(0.02) 





24 


(0.02) 


0.37 


(0.03) 





19 


(0.02) 


0.22 


(0.03) 





33 


(0.03) 





18 


(0.02) 





18 


(0.02) 


0.23 


(0.05) 


MC 





22 


(0.02) 





25 


(0.02) 


0.37 


(0.02) 





20 


(0.02) 


0.22 


(0.02) 





33 


(0.03) 





18 


(0.02) 





18 


(0.02) 


0.23 


(0.04) 


LDA 





25 


(0.02) 





23 


(0.02) 


0.36 


(0.03) 





26 


(0.02) 


0.18 


(0.02) 





32 


(0.03) 





47 


(0.02) 





41 


(0.04) 


0.45 


(0.03) 


knn 





23 


(0.02) 





26 


(0.02) 


0.41 


(0.03) 





21 


(0.02) 


0.23 


(0.02) 





39 


(0.03) 





19 


(0.02) 





18 


(0.03) 


0.34 


(0.05) 


n-Bayes 





30 


(0.08) 





38 


(0.07) 


0.47 


(0.03) 





32 


(0.09) 


0.38 


(0.09) 





46 


(0.04) 





31 


(0.10) 





37 


(0.10) 


0.47 


(0.04) 


SVM 





22 


(0.02) 





21 


(0.02) 


0.34 


(0.02) 





20 


(0.02) 


0.16 


(0.02) 





29 


(0.02) 





18 


(0.02) 





06 


(0.01) 


0.14 


(0.02) 


NSC 





22 


(0.02) 





25 


(0.02) 


0.34 


(0.03) 





20 


(0.02) 


0.22 


(0.02) 





31 


(0.03) 





18 


(0.02) 





18 


(0.02) 


0.22 


(0.02) 


StepPlr 





25 


(0.02) 





23 


(0.02) 


0.36 


(0.03) 





26 


(0.02) 


0.19 


(0.02) 





32 


(0.03) 





23 


(0.02) 





05 


(0.01) 


0.15 


(0.02) 


il)ail 





3() 


^o.o2) 







{().{)2) 


0. 12 


(0.02) 





3(i 


(0.02) 


0.37 


^().02) 





10 


(0.03) 





3() 


(0.02) 







(j).Q2) 


0.39 


^().0:i) 



20 



Table 3: Simulation study: independent identically distributed asymmetric variables. 
Misclassification rates (with standard errors in brackets) for different methods. Rows 2 
and 4 contain the mean of the chosen values of 9 in the training sets. 

n = 50 











p = 


= 50 
















p = 


= 100 
















p = 


500 












100% 






50% 






10% 






100% 






50% 






10% 






100% 






50% 






10% 


QCG 


0.25 


(0.09) 





36 


(0.08) 





43 


(0.05) 





26 


(0.10) 





36 


(0.09) 





44 


(0.05) 





26 


(0.07) 





35 


(0.07) 





45 


(0.04) 


e Galton 


0.18 


(0.16) 





28 


(0.26) 





46 


(0.31) 





35 


(0.27) 





44 


(0.28) 





60 


(0.26) 





48 


(0.23) 





52 


(0.20) 





61 


(0.11) 


QCS 


0.20 


(0.07) 





28 


(0.08) 





42 


(0.05) 





21 


(0.08) 





24 


(0.07) 





42 


(0.06) 





27 


(0.10) 





26 


(0.07) 





30 


(0.09) 


9 Skewn. 


0.06 


(0.05) 





08 


(0.10) 





29 


(0.30) 





06 


(0.10) 





10 


(0.18) 





38 


(0.38) 





06 


(0.10) 





05 


(0.09) 





15 


(0.30) 


CC 


0.43 


(0.05) 





44 


(0.05) 





46 


(0.04) 





43 


(0.05) 





43 


(0.05) 





44 


(0.05) 





39 


(0.06) 





43 


(0.05) 





45 


(0.04) 


MC 


0.38 


(0.07) 





43 


(0.05) 





44 


(0.05) 





34 


(0.07) 





40 


(0.06) 





45 


(0.04) 





17 


(0.06) 





30 


(0.07) 





43 


(0.05) 


LDA 


0.44 


(0.05) 





44 


(0.04) 





44 


(0.05) 





44 


(0.04) 





43 


(0.04) 





45 


(0.04) 





43 


(0.05) 





44 


(0.05) 





45 


(0.04) 


knn 


0.45 


(0.05) 





46 


(0.03) 





46 


(0.04) 





44 


(0.04) 





45 


(0.04) 





45 


(0.04) 





45 


(0.05) 





46 


(0.04) 





46 


(0.03) 


n-Bayes 


0.44 


(0.04) 





44 


(0.05) 





44 


(0.05) 





45 


(0.04) 





45 


(0.04) 





45 


(0.04) 





44 


(0.05) 





44 


(0.05) 





44 


(0.05) 


SVM 


0.43 


(0.04) 





44 


(0.05) 





44 


(0.05) 





43 


(0.05) 





43 


(0.04) 





45 


(0.04) 





39 


(0.06) 





43 


(0.05) 





45 


(0.04) 


NSC 


0.45 


(0.04) 





45 


(0.04) 





45 


(0.04) 





45 


(0.05) 





44 


(0.05) 





44 


(0.04) 





43 


(0.06) 





43 


(0.05) 





45 


(0.04) 


stopPlr 


0.13 


(ens') 





11 


(om) 





11 


(0.01') 





11 


(0.05) 





13 


(0.05) 





11 


(0.05) 





38 


(0.07) 





12 


(0.05) 





15 


(0.01) 


ri>aiL 


0. 12 


^o.()5) 





12 


(O.Oli) 





11 


^o.ol) 





12 


(O.Oli) 





12 


(0.()(>) 





11 


(O.O.j) 





11 


l^O.OG) 





12 


(O.Oli) 





11 


^o.l)5) 



n = 100 











p = 


= 50 














P = 


= 100 














p = 


500 












100% 






50% 






10% 






100% 




50% 






10% 






100% 




50% 






10% 


QCG 


0.09 


(0.04) 





18 


(0.05) 





42 


(0.05) 





07 


(0.04) 


0.14 


(0.05) 





37 


(0.06) 





05 


(0.07) 


0.14 


(0.11) 





42 


(0.06) 


e Galton 


0.04 


(0.02) 





04 


(0.03) 





19 


(0.25) 





05 


(0.06) 


0.06 


(0.06) 





17 


(0.23) 





27 


(0.20) 


0.29 


(0.24) 





56 


(0.25) 


QCS 


0.09 


(0.03) 





17 


(0.04) 





41 


(0.05) 





06 


(0.03) 


0.12 


(0.04) 





35 


(0.05) 





01 


(0.02) 


0.06 


(0.07) 





27 


(0.10) 


Skewn. 


0.04 


(0.02) 





03 


(0.02) 





13 


(0.21) 





03 


(0.02) 


0.04 


(0.02) 





10 


(0.17) 





17 


(0.15) 


0.11 


(0.17) 





17 


(0.31) 


CC 


0.43 


(0.05) 





45 


(0.03) 





46 


(0.03) 





41 


(0.05) 


0.44 


(0.04) 





46 


(0.03) 





33 


(0.05) 


0.40 


(0.05) 





46 


(0.03) 


MC 


0.34 


(0.06) 





41 


(0.05) 





46 


(0.04) 





30 


(0.04) 


0.38 


(0.06) 





45 


(0.03) 





11 


(0.03) 


0.25 


(0.04) 





43 


(0.04) 


LDA 


0.44 


(0.04) 





46 


(0.03) 





46 


(0.03) 





46 


(0.03) 


0.45 


(0.04) 





46 


(0.03) 





41 


(0.05) 


0.44 


(0.04) 





45 


(0.03) 


knn 


0.45 


(0.03) 





46 


(0.03) 





46 


(0.03) 





45 


(0.03) 


0.46 


(0.03) 





46 


(0.03) 





45 


(0.04) 


0.46 


(0.03) 





47 


(0.03) 


n-Bayes 


0.46 


(0.03) 





46 


(0.03) 





46 


(0.03) 





46 


(0.03) 


0.46 


(0.03) 





46 


(0.03) 





46 


(0.03) 


0.46 


(0.03) 





46 


(0.03) 


SVM 


0.42 


(0.04) 





45 


(0.04) 





46 


(0.03) 





41 


(0.04) 


0.44 


(0.05) 





46 


(0.03) 





34 


(0.05) 


0.41 


(0.05) 





46 


(0.03) 


NSC 


0.45 


(0.03) 





46 


(0.03) 





46 


(0.03) 





45 


(0.04) 


0.46 


(0.04) 





46 


(0.03) 





42 


(0.05) 


0.44 


(0.04) 





46 


(0.03) 


stepPlr 


0.43 


(0.04) 





46 


(0.03) 





46 


(0.03) 





42 


(0.05) 


0.45 


(0.04) 





46 


(0.03) 





33 


(0.05) 


0.40 


(0.05) 





46 


(0.03) 


rpart 


0.36 


(0.06) 





39 


(0.06) 


0.44 


(0.05) 





37 


(0.06) 


0.38 


(0.07) 





43 


(0.05) 





37 


(0.06) 


0.38 


(0.06) 


0.43 


(0.05) 


























n = 


= 500 
































P = 


= 50 














P = 


= 100 














P = 


500 












100% 






50% 






10% 






100% 




50% 






10% 






100% 




50% 






10% 


QCG 


0.02 


(0.01) 





09 


(0.01) 





34 


(0.03) 





00 


(0.00) 


0.03 


(0.01) 





26 


(0.02) 





00 


(0.00) 


0.00 


(0.00) 





06 


(0.01) 


e Galton 


0.02 


(0.01) 





02 


(0.01) 





05 


(0.03) 





02 


(0.01) 


0.02 


(0.01) 





03 


(0.02) 





17 


(0.04) 


0.02 


(0.00) 





03 


(0.01) 


QCS 


0.02 


(0.01) 





09 


(0.01) 





34 


(0.03) 





00 


(0.00) 


0.03 


(0.01) 





26 


(0.02) 





00 


(0.00) 


0.00 


(0.00) 





06 


(0.01) 


9 Skewn. 


0.02 


(0.01) 





02 


(0.01) 





05 


(0.03) 





02 


(0.01) 


0.02 


(0.01) 





03 


(0.02) 





17 


(0.04) 


0.02 


(0.00) 





03 


(0.01) 


CC 


0.40 


(0.02) 





44 


(0.02) 





48 


(0.02) 





36 


(0.02) 


0.42 


(0.02) 





48 


(0.02) 





22 


(0.02) 


0.33 


(0.02) 





46 


(0.02) 


MC 


0.31 


(0.02) 





37 


(0.02) 





46 


(0.02) 





23 


(0.02) 


0.32 


(0.02) 





45 


(0.02) 





05 


(0.01) 


0.15 


(0.02) 





38 


(0.02) 


LDA 


0.41 


(0.02) 





44 


(0.02) 





48 


(0.02) 





37 


(0.02) 


0.42 


(0.02) 





48 


(0.02) 





47 


(0.02) 


0.48 


(0.02) 





48 


(0.02) 


knn 


0.46 


(0.02) 





47 


(0.02) 





48 


(0.01) 





45 


(0.02) 


0.47 


(0.02) 





48 


(0.01) 





43 


(0.03) 


0.46 


(0.02) 





48 


(0.01) 


n-Bayes 


0.47 


(0.02) 





48 


(0.02) 





48 


(0.01) 





47 


(0.02) 


0.48 


(0.01) 





48 


(0.01) 





47 


(0.02) 


0.48 


(0.02) 





48 


(0.01) 


SVM 


0.37 


(0.02) 





43 


(0.02) 





48 


(0.02) 





33 


(0.02) 


0.41 


(0.02) 





47 


(0.02) 





19 


(0.02) 


0.32 


(0.02) 





46 


(0.02) 


NSC 


0.45 


(0.02) 





46 


(0.02) 





48 


(0.02) 





43 


(0.02) 


0.45 


(0.02) 





48 


(0.02) 





36 


(0.03) 


0.40 


(0.02) 





46 


(0.02) 


StepPlr 


0.11 


(0.02) 





11 


(0.02) 





18 


(0.02) 





37 


(0.02) 


0.12 


(0.03) 





18 


(0.02) 





27 


(0.02) 


0.36 


(0.02) 





17 


(0.02) 


il)ai I 


0.21 


^l).0:i) 





22 


^o.l):i) 





:-!(> 


(0.02) 





22 


(0.03) 


0.21 


^l).l)3) 





28 


(OA)L) 


1) 


23 


^l).l)3) 


0.23 


(QXYA) 


1) 


22 


^l).0:i) 



21 



Table 4: Simulation study: dependent identically distributed asymmetric variables. Mis- 
classification rates (with standard errors in brackets) for different methods. Rows 2 and 



4 contain the mean of the chosen values of in the training sets. 

n = 50 











p = 


= 50 














P = 


= 100 














p = 


500 












100% 






50% 






10% 






100% 




50% 






10% 






100% 




50% 






10% 


QCG 


0.36 


(0.10) 





40 


(0.08) 





44 


(0.04) 





37 


(0.09) 


0.38 


(0.08) 





44 


(0.04) 





39 


(0.08) 


0.37 


(0.07) 





43 


(0.04) 


e Galton 


0.30 


(0.35) 





34 


(0.33) 





48 


(0.28) 





44 


(0.40) 


0.38 


(0.33) 





53 


(0.34) 





53 


(0.42) 


0.34 


(0.37) 





34 


(0.35) 


QCS 


0.29 


(0.11) 





33 


(0.10) 





43 


(0.05) 





33 


(0.11) 


0.32 


(0.11) 





42 


(0.06) 





37 


(0.12) 


0.31 


(0.13) 





38 


(0.08) 


9 Skewn. 


0.18 


(0.29) 





19 


(0.27) 





33 


(0.28) 





34 


(0.38) 


0.28 


(0.33) 





33 


(0.31) 





54 


(0.40) 


0.35 


(0.35) 





30 


(0.35) 


CC 


0.45 


(0.04) 





44 


(0.05) 





45 


(0.04) 





44 


(0.05) 


0.45 


(0.05) 





45 


(0.05) 





44 


(0.06) 


0.44 


(0.05) 





45 


(0.05) 


MC 


0.41 


(0.06) 





43 


(0.05) 





44 


(0.04) 





41 


(0.06) 


0.43 


(0.05) 





45 


(0.04) 





41 


(0.06) 


0.41 


(0.06) 





44 


(0.05) 


LDA 


0.45 


(0.04) 





44 


(0.04) 





45 


(0.04) 





44 


(0.05) 


0.44 


(0.05) 





45 


(0.04) 





44 


(0.05) 


0.44 


(0.04) 





44 


(0.04) 


knn 


0.45 


(0.04) 





45 


(0.04) 





44 


(0.04) 





45 


(0.04) 


0.45 


(0.04) 





45 


(0.04) 





46 


(0.04) 


0.46 


(0.04) 





46 


(0.04) 


n-Bayes 


0.44 


(0.04) 





45 


(0.05) 





45 


(0.04) 





44 


(0.04) 


0.45 


(0.05) 





44 


(0.05) 





44 


(0.05) 


0.45 


(0.05) 





44 


(0.04) 


SVM 


0.43 


(0.06) 





44 


(0.05) 





44 


(0.05) 





43 


(0.05) 


0.45 


(0.04) 





44 


(0.04) 





43 


(0.05) 


0.44 


(0.04) 





44 


(0.05) 


NSC 


0.46 


(0.03) 





45 


(0.04) 





45 


(0.04) 





45 


(0.04) 


0.45 


(0.04) 





45 


(0.04) 





43 


(0.06) 


0.44 


(0.05) 





44 


(0.05) 


stopPlr 


0.44 


(0.05) 





14 


(0.05) 





41 


(0.05) 





14 


(0.05) 


0.41 


(0.01) 





11 


(0.04) 





41 


(0.05) 


0.12 


(0.05) 





41 


(0.05) 


ri>aiL 


0. i.':! 


(0.()()) 





13 


(O.O.j) 





11 


1,0.05) 





13 


(O.O.j) 


0. 11 


i:o.()5) 





11 


(0.01) 





13 


(0.05) 


0. 13 


(O.Oli) 





13 


(0.05) 


























n = 


= 100 
































P = 


= 50 














P = 


= 100 














P = 


500 












100% 






50% 






10% 






100% 




50% 






10% 






100% 




50% 






10% 


QCG 


0.19 


(0.05) 





25 


(0.07) 





42 


(0.05) 





20 


(0.09) 


0.25 


(0.09) 





41 


(0.06) 





34 


(0.14) 


0.32 


(0.11) 





38 


(0.08) 


e Galton 


0.03 


(0.06) 





05 


(0.12) 





27 


(0.31) 





07 


(0.18) 


0.10 


(0.22) 





30 


(0.33) 





51 


(0.42) 


0.42 


(0.38) 





32 


(0.33) 


QCS 


0.18 


(0.05) 





24 


(0.06) 





42 


(0.05) 





17 


(0.07) 


0.22 


(0.08) 





40 


(0.06) 





33 


(0.16) 


0.31 


(0.14) 





34 


(0.10) 


d Skewn. 


0.03 


(0.06) 





05 


(0.12) 





24 


(0.30) 





05 


(0.15) 


0.09 


(0.19) 





23 


(0.30) 





51 


(0.42) 


0.45 


(0.38) 





31 


(0.34) 


CC 


0.44 


(0.04) 





45 


(0.03) 





46 


(0.03) 





45 


(0.04) 


0.45 


(0.04) 





46 


(0.03) 





44 


(0.04) 


0.45 


(0.04) 





46 


(0.03) 


MC 


0.41 


(0.05) 





43 


(0.05) 





46 


(0.03) 





41 


(0.05) 


0.43 


(0.05) 





46 


(0.03) 





41 


(0.05) 


0.41 


(0.06) 





45 


(0.04) 


LDA 


0.46 


(0.03) 





45 


(0.04) 





46 


(0.03) 





46 


(0.03) 


0.46 


(0.03) 





45 


(0.03) 





45 


(0.03) 


0.46 


(0.04) 





45 


(0.03) 


knn 


0.45 


(0.03) 





46 


(0.03) 





46 


(0.03) 





45 


(0.04) 


0.46 


(0.04) 





46 


(0.03) 





46 


(0.04) 


0.46 


(0.04) 





46 


(0.03) 


n-Bayes 


0.46 


(0.03) 





46 


(0.03) 





47 


(0.03) 





46 


(0.03) 


0.46 


(0.03) 





46 


(0.03) 





45 


(0.03) 


0.46 


(0.03) 





46 


(0.03) 


SVM 


0.44 


(0.04) 





45 


(0.04) 





45 


(0.04) 





43 


(0.05) 


0.45 


(0.04) 





46 


(0.03) 





42 


(0.05) 


0.43 


(0.05) 





46 


(0.03) 


NSC 


0.46 


(0.03) 





47 


(0.03) 





47 


(0.03) 





46 


(0.03) 


0.46 


(0.03) 





46 


(0.03) 





45 


(0.04) 


0.45 


(0.04) 





47 


(0.03) 


stepPlr 


0.46 


(0.03) 





45 


(0.04) 





46 


(0.03) 





45 


(0.03) 


0.44 


(0.04) 





46 


(0.03) 





45 


(0.03) 


0.40 


(0.05) 





45 


(0.04) 


rpart 


0.40 


(0.05) 





42 


(0.04) 


0.45 


(0.04) 





41 


(0.05) 


0.42 


(0.05) 





44 


(0.04) 


0.41 


(0.05) 


0.41 


(0.05) 


0.43 


(0.05) 


























n = 


= 500 
































P = 


= 50 














P = 


= 100 














P = 


500 












100% 






50% 






10% 






100% 




50% 






10% 






100% 




50% 






10% 


QCG 


0.14 


(0.01) 





18 


(0.02) 





36 


(0.03) 





12 


(0.01) 


0.15 


(0.02) 





29 


(0.02) 





11 


(0.01) 


0.12 


(0.05) 





19 


(0.07) 


e Galton 


0.02 


(0.00) 





02 


(0.00) 





06 


(0.05) 





02 


(0.00) 


0.02 


(0.00) 





04 


(0.03) 





02 


(0.00) 


0.04 


(0.12) 





10 


(0.15) 


QCS 


0.14 


(0.01) 





18 


(0.02) 





36 


(0.03) 





12 


(0.01) 


0.15 


(0.02) 





29 


(0.02) 





11 


(0.01) 


0.12 


(0.05) 





19 


(0.07) 


9 Skewn. 


0.02 


(0.00) 





02 


(0.00) 





06 


(0.05) 





02 


(0.00) 


0.02 


(0.00) 





04 


(0.03) 





02 


(0.00) 


0.04 


(0.12) 





10 


(0.15) 


CC 


0.46 


(0.02) 





46 


(0.02) 





48 


(0.02) 





45 


(0.02) 


0.45 


(0.02) 





48 


(0.02) 





45 


(0.02) 


0.44 


(0.03) 





47 


(0.02) 


MC 


0.42 


(0.02) 





43 


(0.02) 





47 


(0.02) 





42 


(0.02) 


0.41 


(0.02) 





46 


(0.02) 





41 


(0.02) 


0.40 


(0.04) 





44 


(0.04) 


LDA 


0.47 


(0.02) 





45 


(0.02) 





48 


(0.01) 





48 


(0.02) 


0.43 


(0.02) 





47 


(0.02) 





48 


(0.01) 


0.47 


(0.02) 





48 


(0.01) 


knn 


0.46 


(0.02) 





46 


(0.02) 





48 


(0.02) 





47 


(0.02) 


0.47 


(0.02) 





48 


(0.01) 





48 


(0.01) 


0.47 


(0.02) 





48 


(0.02) 


n-Bayes 


0.47 


(0.02) 





48 


(0.01) 





48 


(0.01) 





47 


(0.02) 


0.48 


(0.02) 





48 


(0.01) 





47 


(0.02) 


0.47 


(0.02) 





48 


(0.01) 


SVM 


0.44 


(0.02) 





43 


(0.02) 





47 


(0.02) 





43 


(0.02) 


0.42 


(0.02) 





47 


(0.02) 





41 


(0.02) 


0.36 


(0.02) 





45 


(0.02) 


NSC 


0.46 


(0.02) 





47 


(0.02) 





48 


(0.02) 





46 


(0.02) 


0.46 


(0.02) 





48 


(0.02) 





45 


(0.02) 


0.45 


(0.02) 





47 


(0.02) 


StepPlr 


0.47 


(0.02) 





15 


(0.02) 





47 


(0.02) 





18 


(0.02) 


0.43 


(0.02) 





17 


(0.02) 





18 


(0.01) 


0.35 


(0.02) 





41 


(0.02) 


il)ail 


0.29 


^().0:i) 





31 


(:o.():i) 





38 


(0.02) 





29 


(0.03) 


0.31 


^().03) 





3(i 


(0.03) 





29 


^o.(}3) 


0.:il 


CO.OS) 







^().0:i) 



22 



Table 5: Simulation study: independent not identically distributed variables. Misclas- 
sification rates (with standard errors in brackets) for different methods. Rows 2 and 4 



contain the mean of the chosen values of 9 in the training sets. 

n = 50 











p -- 


= 50 














P = 


= 100 
















p = 


500 












100% 






50% 






10% 






100% 




50% 






10% 






100% 






50% 






10% 


QCG 


0.23 


(0.07) 





35 


(0.08) 





44 


(0.04) 





18 


(0.07) 


0.30 


(0.09) 





43 


(0.04) 





09 


(0.05) 





21 


(0.08) 





42 


(0.07) 


e Galton 


0.16 


(0.13) 





22 


(0.22) 





43 


(0.26) 





20 


(0.16) 


0.30 


(0.23) 





45 


(0.24) 





10 


(0.17) 





17 


(0.25) 





35 


(0.31) 


QCS 


0.19 


(0.07) 





32 


(0.08) 





43 


(0.05) 





12 


(0.07) 


0.25 


(0.08) 





42 


(0.05) 





03 


(0.03) 





12 


(0.05) 





38 


(0.06) 


9 Skewn. 


0.09 


(0.09) 





13 


(0.16) 





30 


(0.26) 





10 


(0.13) 


0.13 


(0.16) 





29 


(0.28) 





08 


(0.13) 





09 


(0.13) 





15 


(0.24) 


CC 


0.40 


(0.06) 





43 


(0.05) 





45 


(0.04) 





38 


(0.07) 


0.42 


(0.05) 





44 


(0.04) 





30 


(0.07) 





40 


(0.06) 





45 


(0.04) 


MC 


0.33 


(0.08) 





40 


(0.06) 





45 


(0.04) 





28 


(0.07) 


0.38 


(0.07) 





45 


(0.04) 





09 


(0.04) 





24 


(0.06) 





43 


(0.06) 


LDA 


0.41 


(0.06) 





43 


(0.05) 





44 


(0.04) 





31 


(0.07) 


0.37 


(0.08) 





44 


(0.04) 





27 


(0.07) 





36 


(0.07) 





43 


(0.05) 


knn 


0.43 


(0.05) 





44 


(0.04) 





45 


(0.04) 





42 


(0.06) 


0.44 


(0.05) 





46 


(0.04) 





41 


(0.06) 





44 


(0.04) 





46 


(0.04) 


n-Bayes 


0.37 


(0.07) 





43 


(0.06) 





45 


(0.05) 





37 


(0.07) 


0.41 


(0.06) 





44 


(0.05) 





35 


(0.06) 





41 


(0.06) 





45 


(0.04) 


SVM 


0.26 


(0.06) 





36 


(0.07) 





43 


(0.05) 





20 


(0.05) 


0.30 


(0.06) 





43 


(0.05) 





03 


(0.03) 





14 


(0.05) 





40 


(0.06) 


NSC 


0.43 


(0.05) 





45 


(0.04) 





45 


(0.04) 





43 


(0.05) 


0.44 


(0.05) 





45 


(0.04) 





38 


(0.07) 





43 


(0.05) 





45 


(0.04) 


stopPlr 


0.38 


(0.06) 





12 


(0.06) 





41 


(0.05) 





37 


(0.06) 


0.40 


(0.06) 





15 


(0.04) 





28 


(0.07) 





39 


(0.06) 





44 


(0.04) 


ri>aiL 


0.32 


(0.09) 





.3 .J 


(0.08) 





12 


(O.OCi) 





31 


(0.08) 


0.33 


(0.09) 





11 


(O.Oli) 





31 


1,0.10) 





32 


(0.09) 





3() 


(0.09) 


























n = 


= 100 


































P -- 


= 50 














P = 


= 100 
















P = 


500 












100% 






50% 






10% 






100% 




50% 






10% 






100% 






50% 






10% 


QCG 


0.14 


(0.05) 





26 


(0.06) 





44 


(0.04) 





09 


(0.04) 


0.22 


(0.05) 





43 


(0.04) 





01 


(0.02) 





08 


(0.04) 





39 


(0.05) 


e Galton 


0.07 


(0.07) 





09 


(0.13) 





30 


(0.25) 





11 


(0.08) 


0.13 


(0.10) 





29 


(0.21) 





30 


(0.13) 





25 


(0.14) 





35 


(0.20) 


QCS 


0.13 


(0.04) 





23 


(0.05) 





44 


(0.05) 





07 


(0.03) 


0.19 


(0.05) 





41 


(0.05) 





01 


(0.01) 





07 


(0.04) 





36 


(0.06) 


d Skewn. 


0.06 


(0.06) 





06 


(0.06) 





26 


(0.26) 





07 


(0.05) 


0.09 


(0.07) 





18 


(0.20) 





31 


(0.10) 





24 


(0.13) 





23 


(0.22) 


CC 


0.39 


(0.05) 





44 


(0.04) 





46 


(0.03) 





36 


(0.05) 


0.42 


(0.04) 





46 


(0.03) 





23 


(0.04) 





35 


(0.06) 





45 


(0.03) 


MC 


0.30 


(0.05) 





39 


(0.05) 





45 


(0.03) 





23 


(0.04) 


0.34 


(0.05) 





44 


(0.04) 





05 


(0.02) 





18 


(0.04) 





41 


(0.04) 


LDA 


0.28 


(0.05) 





37 


(0.05) 





45 


(0.04) 





41 


(0.06) 


0.43 


(0.05) 





46 


(0.03) 





17 


(0.05) 





27 


(0.05) 





43 


(0.04) 


knn 


0.44 


(0.04) 





46 


(0.03) 





46 


(0.03) 





43 


(0.05) 


0.45 


(0.03) 





46 


(0.03) 





40 


(0.06) 





44 


(0.04) 





46 


(0.03) 


n-Bayes 


0.36 


(0.05) 





42 


(0.05) 





46 


(0.03) 





33 


(0.05) 


0.41 


(0.05) 





45 


(0.03) 





26 


(0.04) 





37 


(0.05) 





45 


(0.03) 


SVM 


0.24 


(0.04) 





33 


(0.05) 





45 


(0.04) 





15 


(0.03) 


0.27 


(0.05) 





43 


(0.05) 





01 


(0.01) 





08 


(0.03) 





36 


(0.05) 


NSC 


0.43 


(0.05) 





45 


(0.04) 





46 


(0.04) 





42 


(0.05) 


0.44 


(0.03) 





45 


(0.04) 





34 


(0.05) 





40 


(0.05) 





45 


(0.04) 


stepPlr 


0.32 


(0.05) 





39 


(0.05) 





45 


(0.04) 





32 


(0.05) 


0.40 


(0.04) 





46 


(0.03) 





21 


(0.04) 





33 


(0.05) 





45 


(0.04) 


rpart 


0.19 


(0.06) 





22 


(0.08) 





39 


(0.05) 





17 


(0.06) 


0.19 


(0.07) 





31 


(0.07) 





16 


(0.04) 





17 


(0.06) 





22 


(0.07) 


























n = 


= 500 


































P -- 


= 50 














P = 


= 100 
















P = 


500 












100% 






50% 






10% 






100% 




50% 






10% 






100% 






50% 






10% 


QCG 


0.07 


(0.01) 





15 


(0.02) 





39 


(0.03) 





02 


(0.01) 


0.10 


(0.02) 





34 


(0.03) 





00 


(0.00) 





01 


(0.00) 





20 


(0.02) 


e Galton 


0.05 


(0.03) 





04 


(0.04) 





09 


(0.09) 





07 


(0.02) 


0.07 


(0.04) 





09 


(0.09) 





31 


(0.02) 





19 


(0.05) 





14 


(0.06) 


QCS 


0.07 


(0.01) 





15 


(0.02) 





38 


(0.03) 





02 


(0.01) 


0.09 


(0.02) 





34 


(0.03) 





00 


(0.00) 





01 


(0.00) 





20 


(0.02) 


9 Skewn. 


0.05 


(0.03) 





03 


(0.03) 





08 


(0.08) 





07 


(0.03) 


0.07 


(0.03) 





07 


(0.08) 





32 


(0.03) 





20 


(0.05) 





13 


(0.06) 


CC 


0.34 


(0.03) 





40 


(0.02) 





47 


(0.02) 





29 


(0.03) 


0.37 


(0.02) 





46 


(0.02) 





12 


(0.02) 





23 


(0.02) 





43 


(0.03) 


MC 


0.24 


(0.02) 





33 


(0.02) 





45 


(0.02) 





16 


(0.02) 


0.26 


(0.02) 





43 


(0.02) 





01 


(0.01) 





08 


(0.01) 





35 


(0.02) 


LDA 


0.18 


(0.02) 





27 


(0.02) 





42 


(0.02) 





11 


(0.02) 


0.21 


(0.02) 





39 


(0.02) 





40 


(0.04) 





43 


(0.03) 





47 


(0.02) 


knn 


0.42 


(0.02) 





46 


(0.02) 





48 


(0.01) 





41 


(0.02) 


0.45 


(0.02) 





48 


(0.01) 





35 


(0.03) 





42 


(0.02) 





48 


(0.02) 


n-Bayes 


0.26 


(0.02) 





35 


(0.03) 





46 


(0.02) 





22 


(0.03) 


0.32 


(0.02) 





46 


(0.02) 





11 


(0.02) 





24 


(0.02) 





43 


(0.02) 


SVM 


0.18 


(0.02) 





27 


(0.02) 





42 


(0.02) 





10 


(0.01) 


0.19 


(0.02) 





39 


(0.02) 





00 


(0.00) 





03 


(0.01) 





25 


(0.02) 


NSC 


0.28 


(0.03) 





33 


(0.03) 





43 


(0.03) 





22 


(0.03) 


0.29 


(0.03) 





41 


(0.03) 





06 


(0.01) 





13 


(0.02) 





33 


(0.03) 


StepPlr 


0.19 


(0.02) 





27 


(0.02) 





42 


(0.02) 





13 


(0.02) 


0.22 


(0.02) 





39 


(0.02) 





08 


(0.01) 





18 


(0.02) 





40 


(0.02) 


il)ail 


0.0 i 


^().ol) 





()() 


(:o.()i) 





32 


(0.02) 





01 


CO.Ol) 


0.01 


^o.ol) 





22 


(0.03) 





05 


^o.ol) 





01 


(0.01) 





01 


^().ol) 



23 



Table 6: Simulation study: dependent not identically distributed variables. Misclassifica- 
tion rates (with standard errors in brackets) for different methods. Rows 2 and 4 contain 



the mean of the chosen values of 9 in the training sets. 

n = 50 











p = 


= 50 














P = 


= 100 
















p = 


500 












100% 






50% 






10% 






100% 




50% 






10% 






100% 






50% 






10% 


QCG 


0.24 


(0.07) 





35 


(0.07) 





44 


(0.04) 





20 


(0.07) 


0.32 


(0.08) 





44 


(0.05) 





23 


(0.16) 





27 


(0.12) 





41 


(0.06) 


e Galton 


0.13 


(0.15) 





20 


(0.20) 





38 


(0.27) 





10 


(0.12) 


0.21 


(0.24) 





48 


(0.31) 





38 


(0.45) 





30 


(0.40) 





35 


(0.39) 


QCS 


0.21 


(0.06) 





31 


(0.07) 





43 


(0.05) 





15 


(0.06) 


0.26 


(0.08) 





42 


(0.06) 





20 


(0.19) 





20 


(0.14) 





38 


(0.07) 


9 Skewn. 


0.06 


(0.08) 





11 


(0.15) 





30 


(0.29) 





05 


(0.08) 


0.12 


(0.21) 





29 


(0.33) 





39 


(0.46) 





26 


(0.38) 





23 


(0.36) 


CC 


0.42 


(0.06) 





43 


(0.06) 





44 


(0.04) 





40 


(0.06) 


0.43 


(0.05) 





44 


(0.04) 





38 


(0.07) 





41 


(0.06) 





44 


(0.04) 


MC 


0.35 


(0.07) 





40 


(0.06) 





45 


(0.04) 





31 


(0.07) 


0.39 


(0.07) 





44 


(0.04) 





25 


(0.08) 





32 


(0.08) 





42 


(0.06) 


LDA 


0.43 


(0.06) 





44 


(0.05) 





44 


(0.05) 





32 


(0.07) 


0.39 


(0.06) 





44 


(0.05) 





21 


(0.07) 





33 


(0.07) 





42 


(0.06) 


knn 


0.43 


(0.05) 





44 


(0.04) 





44 


(0.04) 





42 


(0.06) 


0.44 


(0.04) 





45 


(0.04) 





40 


(0.06) 





42 


(0.05) 





44 


(0.04) 


n-Bayes 


0.38 


(0.07) 





42 


(0.06) 





44 


(0.04) 





37 


(0.06) 


0.41 


(0.06) 





44 


(0.04) 





33 


(0.06) 





40 


(0.06) 





44 


(0.04) 


SVM 


0.28 


(0.06) 





36 


(0.07) 





43 


(0.05) 





22 


(0.06) 


0.32 


(0.06) 





44 


(0.05) 





11 


(0.06) 





20 


(0.06) 





41 


(0.06) 


NSC 


0.44 


(0.05) 





45 


(0.04) 





46 


(0.04) 





43 


(0.05) 


0.44 


(0.04) 





45 


(0.04) 





40 


(0.06) 





42 


(0.05) 





43 


(0.05) 


stopPlr 


0.39 


(0.06) 





12 


(0.06) 





45 


(0.01) 





39 


(0.06) 


0.42 


(0.05) 





11 


(0.04) 





33 


(0.07) 





39 


(0.07) 





41 


(0.01) 


ri>aiL 


0.32 


(0.08) 





31 


(0.08) 





12 


1,0.07) 





32 


(0.09) 


0.33 


(0.09) 





10 


(0.08) 





31 


l_0.09) 





32 


(0.09) 





38 


(0.09) 


























n = 


= 100 


































P = 


= 50 














P = 


= 100 
















P = 


500 












100% 






50% 






10% 






100% 




50% 






10% 






100% 






50% 






10% 


QCG 


0.15 


(0.04) 





26 


(0.05) 





44 


(0.05) 





11 


(0.04) 


0.22 


(0.05) 





43 


(0.05) 





10 


(0.13) 





15 


(0.10) 





38 


(0.06) 


e Galton 


0.04 


(0.04) 





07 


(0.08) 





26 


(0.25) 





05 


(0.04) 


0.07 


(0.07) 





25 


(0.26) 





15 


(0.32) 





13 


(0.26) 





29 


(0.34) 


QCS 


0.14 


(0.04) 





24 


(0.06) 





43 


(0.05) 





10 


(0.04) 


0.19 


(0.05) 





41 


(0.05) 





05 


(0.07) 





12 


(0.08) 





35 


(0.06) 


d Skewn. 


0.04 


(0.04) 





06 


(0.08) 





20 


(0.24) 





04 


(0.02) 


0.05 


(0.04) 





18 


(0.25) 





06 


(0.18) 





07 


(0.19) 





17 


(0.27) 


CC 


0.41 


(0.05) 





44 


(0.04) 





46 


(0.03) 





39 


(0.05) 


0.42 


(0.05) 





46 


(0.04) 





36 


(0.06) 





40 


(0.05) 





45 


(0.03) 


MC 


0.32 


(0.05) 





39 


(0.05) 





46 


(0.03) 





30 


(0.05) 


0.36 


(0.06) 





45 


(0.04) 





22 


(0.07) 





28 


(0.07) 





43 


(0.05) 


LDA 


0.33 


(0.05) 





37 


(0.05) 





44 


(0.04) 





43 


(0.05) 


0.43 


(0.05) 





46 


(0.03) 





17 


(0.04) 





26 


(0.05) 





43 


(0.05) 


knn 


0.44 


(0.04) 





45 


(0.03) 





46 


(0.03) 





43 


(0.04) 


0.45 


(0.03) 





46 


(0.03) 





37 


(0.06) 





43 


(0.04) 





46 


(0.03) 


n-Bayes 


0.36 


(0.05) 





41 


(0.04) 





45 


(0.03) 





33 


(0.05) 


0.40 


(0.05) 





46 


(0.03) 





26 


(0.06) 





35 


(0.06) 





45 


(0.04) 


SVM 


0.25 


(0.05) 





34 


(0.05) 





45 


(0.04) 





18 


(0.05) 


0.27 


(0.05) 





43 


(0.05) 





07 


(0.03) 





13 


(0.04) 





38 


(0.06) 


NSC 


0.43 


(0.04) 





44 


(0.04) 





46 


(0.04) 





42 


(0.05) 


0.44 


(0.04) 





46 


(0.03) 





38 


(0.05) 





41 


(0.06) 





45 


(0.04) 


stepPlr 


0.35 


(0.05) 





41 


(0.05) 





45 


(0.04) 





36 


(0.05) 


0.40 


(0.05) 





46 


(0.03) 





29 


(0.05) 





34 


(0.05) 





45 


(0.04) 


rpart 


0.20 


(0.07) 





21 


(0.06) 





37 


(0.06) 





18 


(0.06) 


0.20 


(0.06) 





31 


(0.07) 





17 


(0.05) 





17 


(0.06) 





21 


(0.07) 


























n = 


= 500 


































P = 


= 50 














P = 


= 100 
















P = 


500 












100% 






50% 






10% 






100% 




50% 






10% 






100% 






50% 






10% 


QCG 


0.08 


(0.02) 





15 


(0.03) 





38 


(0.03) 





05 


(0.01) 


0.11 


(0.02) 





34 


(0.03) 





01 


(0.00) 





03 


(0.01) 





21 


(0.03) 


e Galton 


0.03 


(0.02) 





04 


(0.04) 





09 


(0.07) 





04 


(0.02) 


0.05 


(0.04) 





08 


(0.07) 





02 


(0.00) 





03 


(0.01) 





07 


(0.05) 


QCS 


0.08 


(0.02) 





15 


(0.02) 





37 


(0.03) 





05 


(0.01) 


0.11 


(0.02) 





33 


(0.03) 





01 


(0.01) 





03 


(0.01) 





21 


(0.03) 


9 Skewn. 


0.02 


(0.01) 





02 


(0.01) 





08 


(0.08) 





03 


(0.01) 


0.04 


(0.02) 





05 


(0.06) 





02 


(0.00) 





03 


(0.01) 





07 


(0.05) 


CC 


0.38 


(0.03) 





42 


(0.03) 





47 


(0.02) 





37 


(0.03) 


0.40 


(0.03) 





46 


(0.02) 





36 


(0.04) 





35 


(0.06) 





44 


(0.03) 


MC 


0.29 


(0.02) 





35 


(0.03) 





45 


(0.02) 





26 


(0.03) 


0.31 


(0.03) 





44 


(0.02) 





20 


(0.04) 





23 


(0.06) 





38 


(0.04) 


LDA 


0.22 


(0.02) 





29 


(0.02) 





42 


(0.02) 





18 


(0.02) 


0.23 


(0.02) 





39 


(0.02) 





43 


(0.04) 





44 


(0.03) 





47 


(0.02) 


knn 


0.42 


(0.02) 





46 


(0.02) 





48 


(0.01) 





40 


(0.02) 


0.44 


(0.02) 





48 


(0.01) 





35 


(0.03) 





41 


(0.02) 





47 


(0.02) 


n-Bayes 


0.27 


(0.03) 





36 


(0.03) 





46 


(0.02) 





24 


(0.03) 


0.33 


(0.03) 





46 


(0.02) 





15 


(0.03) 





24 


(0.04) 





43 


(0.02) 


SVM 


0.20 


(0.02) 





28 


(0.02) 





42 


(0.02) 





14 


(0.01) 


0.21 


(0.02) 





38 


(0.02) 





04 


(0.01) 





05 


(0.01) 





25 


(0.02) 


NSC 


0.30 


(0.03) 





35 


(0.03) 





43 


(0.03) 





26 


(0.03) 


0.32 


(0.03) 





42 


(0.03) 





19 


(0.04) 





22 


(0.05) 





35 


(0.04) 


StepPlr 


0.22 


(0.02) 





29 


(0.02) 





42 


(0.02) 





19 


(0.02) 


0.21 


(0.02) 





10 


(0.02) 





19 


(0.02) 





22 


(0.02) 





39 


(0.02) 


il)ail 


0.0 i 


^().ol) 





07 


(:o.()i) 





32 


(0.03) 





01 


CO.Ol) 


0.01 


^o.ol) 





22 


(0.03) 





05 


^o.ol) 





01 


(0.01) 





01 


^().ol) 



24 



5.2 Real data example 

For illustration, we apply the quantile classifier to a data set from chemistry. These data 
were collected testing a new method to detect bioaerosol particles based on gaseous plasma 
electrochemistry. The presence of such particles in ai r has a big impact on healt h, but mon- 



itoring bioaerosols poses great technical challenges. ISarantaridis et al.l (120 12[ ) attempted 
to tell several different bioaerosols apart based on voltage changes over time on eight 
different electrodes when particles passed a premixed laminar hydrogen/oxygen/nitrogen 
flame. 

The resulting dat a are eight time series with 301 observations each for each particle. 



Sarantaridis et al.l (120121 ) discussed how the relevant information in every time series can 



be summarised in six characteristic features, namely 

1. Maximum voltage in series. 

2. Minimum voltage in series. 

3. Maximum voltage change caused by electrode. 

4. Difference between final and initial voltage. 

5. Length of positive change caused by the electrode. 

6. Length of negative change caused by the electrode. 



Details are given in ISarantaridis et al.l ( 12012| ). Actually a seventh vari able (time point of 



maxim um change) was used there, which we omit here. Although in ISarantaridis et al. 



(I2OI2I ) it contributed to the classification, the chemists (personal communication) sus- 
pected this to be an artifact because knowledge of the experiment suggests that this 
variable is caused by other experimental features than the type of the bioaerosol. We are 
therefore left with 48 variables (six for each of the eight electrodes). 
These variables need to be standardised because the quantile classifier (as well as a number 
of other classifiers) is sensitive to measurement scales. It should depend on the application 
how exactly standardisation is carried out. Often researchers use standard schemes for 
standardisation such as dividing by variable-wise standard deviations. In the current 
example, however, we apply a different scheme, which is motivated by the expectation of 
the chemists that the size of variation in voltage and length of effect is actually informative 
and that electrodes and variables for which the electrode causes stronger variation are 
actually more important for discrimination. Standard standardisation would remove such 
information. Still, the variables 1-4 (voltages) on one hand and 6-7 (effect lengths) on 
the other hand do not have comparable measurement units. Therefore we computed one 
standard deviation from all 8 * 4 voltage variables and standardised all these variables by 
the same standard deviation, and the 8*2 effect length variables were also standardised 
by the standard deviation computed from all of them combined. 

We confine ourselves to the classification problem of distinguishing between two bioaerosols, 
namely Bermuda Smut Spores and Black Walnut Pollen. For each bioaerosol there were 
data from thirty particles. 

The quantile classifier has been applied on no-preprocessed data and on data with signs 
adjustments according to the conventional skewness and its robust Galton version. We 
used leave-one-out cross-validation to assess the perfomance of the classifier. Within each 



25 



Table 7: Leave-one-out cross-validated misclassification rates of the bioaerosol particles 
data. In brackets standard errors are reported. 

Methods Misclassification rates 



QC (no skewness correction) 


0.133 


(0.044) 


QCG 


0.033 


(0.023) 


QCS 


0.117 


(0.042) 


cc 


0.217 


(0.054) 


MC 


0.267 


(0.058) 


IDA 


0.067 


(0.032) 


knn 


0.150 


(0.046) 


n-Bayes 


0.150 


(0.046) 


SVM 


0.100 


(0.039) 


NSC 


0.267 


(0.058) 


stepPlr 


0.100 


(0.039) 


rpart 


0.400 


(0.064) 



fold we selected the optimal 6 in the training set. Table [7] contains the misclassification 
rates of the quantile classifier according to the different preprocessing strategies. We 
also evaluated other discriminant methods: the component-wise centroid and median 
classifiers, linear and quadratic discriminant analysis, the fc-nearest-neighbor classifier 
with k = 5, the naive Bayes classifier, the support vector machine, the nearest- shrunken 
centroid method, penalized logistic regression and classification trees. 
It can be seen from these results that the quantile classifier with Galton skewness correc- 
tion is particularly effective for classifying the two bioaerosols and outperforms the other 
methods. Only two particles are misclassified. 

It is worth noting that the sign adjustment preprocessing step is particularly relevant. If 
no sign adjustment is performed, the choice of the optimal quantile value is more variable 
across the cross validated sets (and closer to the midpoint on average) because of the 
possible different directions of skewness in the observed variables. In this case, when 
data are preprocessed according to the Galton skewness correction, the selected optimal 
6 across the cross validated sets is always extremely small with an average of 0.04. This 
means that more discriminant information between the two bioaerosols is contained in 
the left tail of the observed distributions rather than in their "core" . 



6 Conclusion 

The idea of the c omponentw i se qu antile classifier was inspired by the componentwise 



median classifier in lHall et al.l ( l2009l ). The simulations and the application show that the 
quantile classifier can compete with the median classifier in the (symmetric) situations 
where the median classifier is best, but is much better for asymmetric and mixed variables 
due to its larger fiexibility. It also compares very favourable to all the other classifiers 
tested in the present work. 

Basic issues with the componentwise quantile classifier are that it ignores the correlation 
structure (which though does not seem to do much harm in the simulations with dependent 
variables) and that it requires scaling of the variables because it is not scale equivariant. 
As all distance-based classifiers, it does not require the classification information to be 



26 



concentrated on a much lower dimensional space. 

As mentioned in Remark [3l different ^-values may be optimal for different variables, 
but our attempt to estimate a different 9 for every variable was not successful. A more 
computationally complex scheme could improve things. 

The connection between skewness and the choice of 9, which we exploited successfully by 
the skewness correction, is theoretically not straightforward, and it is not possible to say 
that, for example, right-skew distributions will always yield an optimal 9 to be on the same 
side of 0.5, although our experience suggests that this is often the case. An alternative 
tried out by us was to estimate a different 9 for every variable not for classification, 
but for multiplying those variables by —1 that yielded a 6* > 0.5 as a substitute for 
skewness correction more directly tailored to the idea of the quantile classifier, so that the 
overall optimal 9 would not have to be a compromise between vastly different variable- wise 
choices. This, however, led to worse classification results on average than the skewness 
correction. Future research could still explore how to "unify" the variables optimally, if 
possible with theoretical justification. 

References 

Bellman, R. (1961). Adaptive Control Processes. Princeton University Press. 

Bensmail, H. and G. Celeux (1996). Regularized gaussian discriminant analysis through 
eigenvalue decomposition. Journal of the American Statistical Society 91, 1743-1748. 

Breiman, L., J. Friedman, R. Olshen, and C. Stone (1984). Classification and Regression 
Trees. Belmont CA: Wadsworth. 

Cai, T. and W. Liu (2011). A direct estimation approach to sparse linear discriminant 
analysis. Journal of the American Statistical Association 106(496), 1566-1577. 

Cortes, C. and V. Vapnik (1995). Support- Vector Networks. Machine Learning 20, 273- 
297. 

Cover, T. and P. Hart (1967). Nearest neighbor pattern classification. IEEE Transactions 
on Information Theory 13, 21-27. 

Dabney, A. (2005). Classification of microarrays to nearest centroids. Bioinformatics 21, 
4148-4154. 

Dudoit, S., J. Fridlyand, and T. Speed (2002). Comparison of discrimination methods 
for the classification of tumors using gene expression data. Journal of the American 
Statistical Society 91, 77-87. 

Fan, J. and Y. Fan (2008). High dimensional classification using features annealed inde- 
pendence rules. The annals of Statistics 36, 2605-2637. 

Fraley, C. and A. Raftery (2002). Model-based clustering, discriminant analysis, and 
density estimation. Journal of the American Statistical Society 97, 611-631. 

Friedman, J. (1989). Regularized discriminant analysis. Journal of the American Statis- 
tical Society 84, 165-175. 



27 



Ghosh, A. and P. Chaudhuri (2005). On Data Depth and Distribution- Free Discriminant 
Analysis Using Separating Surfaces. Bernoulli 11, 1-27. 

Guo, Y., T. Hastie, and R. Tibshirani (2007). Regularized linear discriminant analysis 
and its application in microarrays. Biostatistics 8, 86-100. 

Hall, R, D. M. Titterington, and J.-H. Xue (2009). Median-based classifiers for high- 
dimensional data. Journal of the American Statistical Society 104, 1597-1608. 

Hand, D. and K. Yu (2001). Idiot's Bayes - Not so Stupid After All? International 
Statistical Review 69, 385-398. 

Hand, D. J. (1997). Contruction and Assessment of Classification Rules. John Wiley & 
Sons. 

Hastie, T. and R. Tibishirani (1996). Discriminant analysis by gaussian mixtures. Journal 
of the Royal Statistical Society - Series B 58, 155-176. 

Hinkley, D. (1975). On power transformations to symmetry. Biometrika 62, 101-111. 

John, G. and P. Langley (1995). Estimating Continuous Distributions in Bayesian Classi- 
fiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, 
pp. 338-345. 

Jornsten, R. (2004). Clustering and Classification based on the Li Data Depth. Journal 
of Multivariate Analysis 91, 67-89. 

Mason, D. M. (1982). Some characterizations of almost sure bounds for weighted multidi- 
mensional empirical distributions and a glivenko-cantelli theorem for sample quantiles. 
Zeitschrift fuer Wahrscheinlichkeitstheorie und verwandte Gebiete 50, 505-513. 

Mika, S., G. Ratsch, J. Weston, B. Scholkopf, and K. Miiller (1999). Fisher discriminant 
analysis with kernels. In IEEE Neural Networs for Signal Processing Workshop, pp. 
41-48. 

Park, M. and T. Hastie (2008). Penahzed logistic regression for detecting gene interactions. 
Biostatistics 9, 30-50. 

Ripley, A. (1994). Neural networks and related methods for classification. Journal of the 
Royal Statistical Society - Series B 56, 409-456. 

Sarantaridis, D., C. Hennig, and D. J. Caruana (2012). Bioaerosol detection using poten- 
tiometric tomography in fiames. Chemical Science 3, 2210-2216. 

Tibshirani, R., T. Hastie, B. Narasimhan, and G. Chu (2002). Diagnosis of Multiple 
Cancer Types by Shrunken Centroids of Gene Expression. Proceedings of the National 
Academy of Sciences of the United States of America 99, 6567-6572. 

Tibshirani, R., T. Hastie, B. Narasimhan, and G. Chu (2003). Class Prediction by Nearest 
Shrunken Centroids, with application to DNA Microarray. Statistical Science 18, 104- 
117. 



28 



Wang, L., J. Zhu, and H. Zou (2008). Hybrid Huberized Support Vector Machines for 
Microarray Classification and Gene Selection. Bioinformatics 24, 412-419. 

Wang, S. and J. Zhu (2007). Improved Centroids Estimation for the Nearest Shrunken 
Centroid Classifier. Bioinformatics 23, 972-979. 



29 



